Week 4: Engineering data sets
Learning goals
- Apply data wrangling techniques using the Tidyverse to clean, transform, and prepare datasets for analysis.
- Structure R scripts into modular components (setup, input, transformation, output) to facilitate reproducibility and automation.
- Implement common data operations (e.g., merging, aggregating, reshaping) and integrate basic programming concepts in R.
- Develop new variables and features (feature engineering) to enhance the analysis and understanding of datasets.
Preparation before the lecture
Lecture
Laptop required!
- Tutorial (view slides, view complete tutorial)
Coaching session
- Work on your team goals for this course week (see coaching #3 on your workplan).
After the lecture
- After-class exercises ( View, Download; right click - download file as)
- Reading: Data preparation process and challenges
- Optional (for students that need extra guidance in developing R skills):
- DataCamp Introduction to Tidyverse (chapter 1 and 3)
- DataCamp Cleaning Data in R (chapter 1 and 2)
- DataCamp Joining Data with dplyr (chapter 1 and 2)
Having a hard time with Datacamp, or seeking to practice more?
Then check this material on manipulating data using
dplyr
andtidyr
!