dPrep - Course Summary & Exam Preparation

Hannes Datta

Welcome to the final lecture in dPrep!



If you haven't done so, please explore the exam page & example questions at https://dprep.hannesdatta.com/docs/exam.

The course evaluation is live at https://app.evalytics.nl. Please voice your opinions!

Agenda

  • Course summary
  • From here onwards
    • During your studies
    • After your studies
  • Course evaluation
  • Exam preparation
  • Remaining questions / Q&A

Course framework

dprep

Positioning in the study program

dprep

Lessons learnt #1: Versioning and Project Management with Git(Hub) (I)

  • Scrum & the project board as a way to collaborate with one another
    • Writing issues is hard (!)
    • Seems to put a lot of structure on stuff, did it work out?
    • Recall how to work together – team <-> individual
  • Versioning can be powerful!
    • delete stuff that isn't needed (“housekeeping”), roll back when you want (git checkout COMMITID -b branch)
    • Oh no, not authenticated! (browser authentication works best)
    • Oh no! Can't push! (pull first!)
    • Use GUIs (e.g., in R or VS Code), complemented with Git Bash

Lessons learnt #1: Versioning and Project Management with Git(Hub) (II)

  • Ready to use Git/versioning in a business setting
    • clone repositories
    • Know what to version, and what not
    • Use .gitignore (and know what it CANNOT do)
    • Work together using issues, feature branches & pull requests (PRs)/merge requests
  • Collaborate on open source projects
    • You know what forks are!
    • You may even actually contributed to public projects!

Lessons learnt #2: Data Exploration

  • RMarkdown! (mixing code with reporting)
    • ability to quickly produce clean documents to share!
    • uh… but how to run it with make?
    • verify you can render markdown documents to HTML and PDF!
  • Doing data quality checks?
    • getting back to your data supplier if needed!
    • reporting summary statistics (summary, table, dplyr for summary stats by group)
  • Use RMarkdown when it is useful (i.e., to produce a doc, slides; NOT for data cleaning)

Lessons learnt #3: Data Engineering

  • Why do we actually have to clean up data?!
  • Data prep code can be complex
    • so… use cheatsheets (also for the exam!)
    • know & apply common data operations (what are those?)!
  • Make setup-ITO blocks
    • setup to load libraries, then input, transformation, output!
    • super crucial for make, too! (know why?)
  • Modularize code, loop, and use functions

Lessons learnt #4: Pipeline building and automation

  • Clarity about where to store stuff
    • review! (components - src, gen, data; modules - e.g., analysis, data-prep, …)
  • Run the work others have done
    • my work, your work, somebody else's work
  • Discover mistakes in code
    • e.g., packages that weren't loaded, order of cells/code
  • Save time!
    • but, yeah, NOT at the beginning when you still learn
  • Continuously improve documentation & code

Common mistakes w/ make

  • Wrong: One “all” rule with all targets as prerequisites
  • Wrong: Incomplete dependencies
  • Wrong: Multiple Rscript execution commands per recipe
  • Correct: one “entry” rule on top, referencing the final output(s)
  • Gradually build your makefile. Go in baby steps!
  • Happy to take a look at your repositories now - anyone?

Any other questions about your projects?

We can address them now!

Reflection about the course

  • it's amazing to see you learn & grow!
    • that feeling you get when your code runs…!
    • I can talk to you as “members of my own development team”
    • I've witnessed you “seeing code and feeling what it does!”
  • different levels
    • many started R from scratch; some could program but still were challenged with make and GitHub
  • we've seen a lot: different projects, data sets, tools, “way of doing things”, way of coding, videos, websites

My recommendation: take time to let it sink in. And trust in my choice of teaching you this. Your hard work will pay off.

Looking ahead: During your studies (I)

  • Gradually implement across classes or projects
    • e.g., some projects just benefit from better directory structure, while others may need “more”
    • risk of “losing” the skill (!)
  • Realize it takes time to learn
    • it took me years to become proficient
    • onboard team members to “your” way of working

Looking ahead: During your studies (II)

  • Use for thesis!
    • but, be aware your skills are so fresh, your professor likely has never heard about it! –> use, teach, show how productive you are!
  • Build your job market profile
    • e.g., have a Hugo website with your CV
    • e.g., “pin” your best repositories (but don't call them dprep-team-X; remember, you're a marketer!)
    • have a compelling about page on GitHub

Looking ahead: During your studies (III)

Looking ahead: After your studies

Next steps: Submissions and preparing for the exam

Exam planning

  • Organization
    • 14 October (time?; 3 hours)
    • On campus, using TestVision
  • Software & materials
    • access to R/RStudio, Git, make
    • no access to the internet
    • I'm making selecting resources available on the instruction page - check them out here
  • How to prepare?
    • familiarize yourselves with how TestVision works (link on Canvas)
    • let's look at some questions now

Learning goals, distribution of points + tips (I)

  • 100 points in total, 9 questions
  • Mix of open and closed questions

  • LG 1: Use R to clean and transform data for analysis [synthesis; 20% of points]

    • know core concepts and corresponding code in tidyverse/dplyr: e.g., aggregation, merging, de-duplication, reshaping, data conversions, regular expressions)
    • become fast in coding - no time to “look up” these commands elaborately
    • open .Rdata, submit code and/or submit saved .Rdata file.
    • many “small” questions
    • I simulated the data

Learning goals, distribution of points + tips (II)

  • LG 2: Use GitHub for managing empirical research projects [evaluation; 10% of points]
    • conceptual questions on GitHub issues, project boards, Scrum, etc.
  • LG 3: Use Git/GitHub for versioning files and collaboration on privately-shared and publicly-available (open science) GitHub repositories [application; 30% of points]
    • question type: download zipped repository
    • Be prepared to work with Git Bash (e.g., git checkout, git merge, git add, etc.)
    • download git repository, unzip, do your commits, zip again and submit as a zipfile
    • know how to make commits with commit messages, create branches, switch branches, etc.

Learning goals, distribution of points + tips (II)

  • LG 4: Use R for generating automatic reports (e.g., to assess data quality, to report research findings in a paper) and deploying research findings in novel ways (e.g., apps) [comprehension; 15% of points]
    • except “new”/unseen data
    • When handing in documents, check what I require (for .Rmd, I sometimes ask for rendered .pdf documents - does it work on your computer?)
  • LG 5: Make
    • Use Workflow Management Tools to create and run portable, automated, and reproducible research pipelines [application; 25% of points]
    • Expect a zip file containing a make pipeline
    • Complete a few smaller tasks: run, correct and develop new make workflow

Some tips for your exam

  • Work on a joint cheat sheet (link to be shared on Canvas) - deadline this Friday, 3pm
  • Be prepared to download data sets from Testvision (.Rdata) - on the Cover page of the exam or in a specific question on the exam.

Next steps: Official course evaluation

  • Course evaluation has been immensely important to this course

    • this semester: switched order of sessions, revised GitHub tutorial/onboarding
    • last semester: new data set, coaching sessions with breakout groups on campus and online
  • Course evaluation has been critical to my career

    • Without my past evaluations, I wouldn't be teaching to you today
    • I will look at all comments.
    • Scores are most important to show importance of this course
  • You will be invited via Evalytics at the end of the week

Next steps: Self- and peer assessment

  • You will be invited via e-mail
  • To be filled in using Google Forms

Informal feedback

  • Coaching sessions? (Online, offline)
  • How was it for beginners?
  • What's are three things you'd like me to change?

Stay in touch!