Introduction

Data preparation and Workflow Management (dPrep)

Engineer data sets from complex raw data and manage research projects efficiently

Tilburg University, Block 3, 2020/2021 (February - April 2021)

Instructor: dr. Hannes Datta

Learn how to prepare data for empirical analysis

Welcome to the course website of dPrep. This course teaches you how to engineer data sets for statistical analysis. Many students and researchers perceive the process of “creating” a data set for analysis as rather simplistic: a bit of cleaning here, a bit of merging there, and you’re done. In this course, we take data preparation to the next level, by considering highly complex data preparation workflows (think multiple sources, structured and unstructured data, data from databases and data from files, multiple delivery batches, lots of missing data, different file versions, etc.). Throughout the course, we’ll be using workflow principles of reproducible science that are documented at Tilburg Science Hub.

This website

This website is the backbone of the course, and features the following main sections.

  • The course section holds a list of weekly modules, consisting of live streams, readings, and prerecorded clips. Even if you’re not enrolled in this course, you can watch these clips, but interaction with the course instructor is reserved for enrolled students only.

  • The tutorial and data challenge section offers self-guided tutorials that teach the principles of data preparation and workflow management. It also holds a (weekly) data challenge in which you can put your skills into practice!

  • Finally, this course uses building blocks and examples from Tilburg Science Hub, a platform to help students, researchers and data scientists to work together efficiently.

Enroll in this class