Workplan and Coaching
You’ll start working on your project – an end-to-end, fully automated research workflow – in the first course week. The activities below help you to structure your project. Stick to the plan to ensure you can finish the project in time.
Deliverables are submitted on Canvas and due by Friday, 7 pm, of the respective course week. The course team will provide short feedback in writing at least a day before the next coaching session.
Week 1: Team Formation
- Explore whom you would like to work with.
- Register your team on Canvas.
Week 2: Data Exploration (Coaching #1)
- Coaching Support:
- Discuss your research question and data set choice with your coach. Ensure the question is suitable and the data set is appropriate for the analysis.
- Begin exploring your data using RMarkdown to understand its structure and contents.
- Address any installation or data access issues.
- Deliverables:
- RMarkdown document (both
.md
and rendered as.pdf
), with your research motivation and initial data exploration (see grading guidelines, 1.1 (Research Motivation) and 2.1 (Data Exploration)).
- RMarkdown document (both
Week 3: GitHub Setup and Data Preparation (Coaching #2)
- Progress on Project:
- Set up your GitHub repository under the
course-dprep
organization. This should include initializing GitHub Issues, setting up a Project Board with columns for “backlog,” “to do,” “in progress,” and “done,” and committing your initialreadme.md
file with the research question. - Continue data preparation by cleaning and transforming the dataset. Ensure that you document your process in RMarkdown.
- Set up your GitHub repository under the
- Coaching Support:
- Review the setup of your GitHub repository with your coach. Ensure the repository structure aligns with your project needs and that you are effectively using GitHub for collaboration.
- Discuss your progress on data preparation. Ensure that the data is properly cleaned and ready for analysis.
- Deliverables:
- Link to your GitHub team repository with (a) updated RMarkdown document reflecting the progress in data preparation, (b) a repository that’s in a good shape (containing your initial
readme.md
file and the first version of your data exploration script to the GitHub repository). - For more details, see grading guidelines, 1.2 (Repository Structure) and 1.3 (Way of working using GitHub).
- Link to your GitHub team repository with (a) updated RMarkdown document reflecting the progress in data preparation, (b) a repository that’s in a good shape (containing your initial
Week 4: Advanced Data Preparation and Initial Analysis (Coaching #3)
- Progress on Project:
- Furhter refine the data preparation, ensuring that the dataset is fully ready for analysis. This should include handling missing data and any necessary feature engineering.
- Begin conducting initial analyses, documenting your process and findings in RMarkdown.
- Continue updating and refining your GitHub repository, ensuring all changes are committed and documented.
- Coaching Support:
- Discuss any challenges in advanced data preparation with your coach, such as handling missing data, data transformations, and variable operationalization.
- Review your initial analysis plans. Ensure that your approach to data analysis is sufficient.
- Support with working with GitHub (it is very important that all team members contribute to the repository using the Git workflow discussed in class)
- Deliverables:
- Updated GitHub repository, with an emphasis on 2.2 (Data Preparation) and 1.3 (Way of working using GitHub). For details, see grading guidelines.
Week 5: Automation and Refinement (Coaching #4)
- Progress on Project:
- Implement automation in your data processing pipeline, ensuring it works without error.
- Continue refining your analysis, incorporating feedback and improving the robustness of your code.
- Ensure that your GitHub repository is up to date, with all automation scripts committed and properly documented.
- Coaching Support:
- Review your automation strategy with your coach, focusing on debugging and efficiency.
- Deliverables:
- Automated data processing pipeline scripts (
makefile
), committed and documented in your GitHub repository. - For details, see grading guidelines, with an emphasis on 3.1 and 3.2 (Source code quality and degree of automation).
- Automated data processing pipeline scripts (
Weeks 6-7: Final Analysis (Coaching #5/#6), optional
- Progress on Project:
- Debug automation of your project
- Conduct final housekeeping checks with your coach to ensure all deliverables are complete and all documentation is in order.
- Continue updating your GitHub repository, ensuring that all scripts, documentation, and final analysis are committed.
- Coaching Support:
- Review the progress of your final analysis and discuss deployment strategies with your coach. Focus on the robustness of your analysis and the usability of your deployment plan. For details, see grading guidelines, section 2.3 (Analysis and Deployment).
- Get feedback on any remaining technical challenges and ensure that your code is well-documented and ready for final submission.