Data Preparation & Programming Skills (328059-M-6)

Author

Hannes Datta

Published

August 15, 2025

Syllabus

Course Description

Marketing analytics projects — whether for a thesis, a research study, or an analytics role — often involve messy, multi-source datasets, complex analyses, and the need for reproducibility. Small ad-hoc fixes might work for simple tasks, but they quickly break down in these larger, more demanding settings. Today’s datasets (e.g., scraped from the Internet)1 and advanced analytics (e.g., requiring high-performance computing) demand a structured, professional way of working.

This course introduces three core principles to help you manage your marketing analytics projects efficiently:

  1. Code – Build strong coding skills in R/RStudio.
  2. Collaborate – Work effectively with others using Git/GitHub.
  3. Automate – Create end-to-end, reproducible research pipelines.

These principles — code, collaborate, automate — are applied across the full research workflow: data exploration, preparation, analysis, and reporting. You’ll also learn how to engineer datasets from complex raw sources and share your findings in innovative formats, such as Markdown documents or interactive apps.

Learning goals

After successful completion of this course, you will be able to:

  • Use GitHub (or similar tools) for managing empirical research projects (e.g., Issues and Project Boards)
  • Use Git/GitHub (or similar tools) for versioning files and collaborating on privately-shared and publicly-available (open science) code repositories
  • Use R to clean and transform data for analysis (e.g., aggregation, merging, de-duplication, reshaping, data conversions, regular expressions)
  • Use R for generating automatic reports (e.g., to assess data quality, to report research findings in a paper) and deploying research findings in novel ways (e.g., apps)
  • Use Workflow Management Tools to create and run portable, automated, and reproducible data pipelines


Positioning of the course in the study program

The course is instructed to MSc students in the Marketing Analytics (TiSEM) program. Seats are also offered to interested Research Master and PhD students.

This course zooms in on ways to professionalize working on empirical research projects. Rather than focusing on one particular aspect of the research workflow (like collecting data or only conducting an analysis), this course connects all components of the research workflow. Along the way, students learn how to write better code (in R, especially concerning preparing complex data for analysis), collaborate in teams (using GitHub), and automate their research workflows (using the automation tool make).

Students are recommended to follow this course before embarking on thesis projects. Research Master and Ph.D. students are advised to take this course early on, as it will likely provide them with many tools that will help them conduct their research efficiently.

Enrollment Information for Students Not Part of the Marketing Analytics Program

Interested Research Master or PhD students are encouraged to apply for a seat.

  • Tilburg University students: Enrollment is possible with the approval of the instructor and your program coordinator. Please notify the course coordinator by 15 August 2025.
  • External students: You are welcome to audit this course. For admission, email a brief research statement, your motivation for joining, and your CV by 15 August 2025.

Notification: Eligible candidates will be informed by 22 August 2025.

Prerequisites

  • The course is instructed to MSc students in the Marketing Analytics (TiSEM) program.
  • Preparation material is available, and students need to have acquired working knowledge in R before the start of the course. Novices may further benefit from following other courses at Tilburg University in which R is used.
  • Students can use their own computer for the duration of this course (Windows, Mac, or Linux). Android/Chromebook/iOS devices are not supported. We recommend laptop with 16 GB+ of RAM as we work with larger data sets requiring preprocessing capacity. The exam is held on campus, using Windows computers. Mac and Linux users are advised to practice on the on-campus computers before the exam.

Teaching format

  • Blended learning: a mix of lecturers, tutorials and coaching sessions, mostly on-campus, and sometimes online or prerecorded
  • Learn state-of-the-art tools popular among scientists, marketing analysts, and data scientists (e.g., R, GitHub, make)

Assessment

  • Team project (40%, out of which 8% are based on students’ individual contribution, measured by self- and peer assessment.
  • Computer exam (60%, 120 minutes)

Passing requirements

Students pass this course if

  • the final course grade (i.e., the weighted average of the following two components: (1) the group project, and (2) the exam; weights indicated above) is ≥ 5.5, and
  • the exam grade is higher than or equal to 5.0 (≥ 5.0).

Final course grades are rounded to multiples of half points (e.g., 6, 6.5, 7, etc.).

Resit policy

  • If students have a grade lower than 5.0 on the exam, they cannot pass this course.
    • Required action: Students will have to take the exam resit.
  • If students are not part of a team, they cannot obtain a grade for the team project and hence cannot pass this course.
    • Required action: Re-enroll in this course’s next edition, and ensure you are part of a team.
  • If students have an exam grade higher or equal to 5.0 but fail the team project (after SPA correction), their total course grade may be lower than 5.5, and hence students fail this course.
    • Required action: Correct team project based on the course coordinator’s grading report, and hand it in again within two weeks after publication of the final grades in this course (submission on Canvas). Only students who fail the team project can have their projects re-graded.
  • Students who have passed the course, but wish to retake the exam, can take the exam resit. The last exam grade counts. Grades for the team project are retained. In other words, the resit exam still counts for 60% of the final course grade.
  • Students who fail the exam and wish to re-take the course in the subsequent semester can retain their assignment grades.

Code of Conduct

  • Please always use English as the default language so that non-Dutch speakers can follow the conversations, even if it concerns topics not directly related to the class (e.g., during breaks).
  • Please in touch with your instructor early on to signal (and solve) problems jointly.
  • Stay up-to-date by checking this syllabus, Canvas, and watch out for updates on Hannes’ social media channels.
  • Be on time, and start on time
  • Feel invited to provide informal feedback!
  • It’s totally fine calling the instructor by his first name.
  • When meeting online, please turn on your camera, which will facilitate interaction with the course instructor and other students.

Instructors

This course is instructed by

  • dr. Hannes Datta, Associate Professor at the Marketing Department of Tilburg University (course coordinator & lectures),
  • Javier Torralba Flores, PhD student at the Marketing Department of Tilburg University (tutorials), and
  • Roshini Sudhaharan, MSc., Junior Lecturer at the Marketing Department of Tilburg University (team project and coaching sessions).
Note

Join Hannes’ professional network on LinkedIn, subscribe to his YouTube Channel, and start following him on GitHub.


  1. See, for example, the data sections in Building a user panel of music consumption from Spotify and Preparing a panel data set on electronics sales and marketing mix activities across thousands of brands.↩︎