GitHub (in-class tutorial)

Hannes Datta

Structure of today's sessions

  • Part 1: GitHub - cloud (“project management”)
  • Part 2: Git - work locally (“workflow”)

  • Coaching session

    • Setup repository, sync to your computers
    • Know the Git workflow: branches, commits, pull requests (PRs)
    • Converge on RQ for project
  • After class: complete tutorial + review command line/terminal (e.g., Datacamp, Mac & Windows; Mac Tutorial)

Introduction to GitHub

  • What is Git? (→ file versioning in “repositories”/folders, open source)
  • What is GitHub? (Git + platform for collaboration, automated testing & deployment; commercial but “free” when open source, owned by Microsoft)
  • Alternatives to GitHub (e.g., Bitbucket, GitLab; on-site installation possible, too)
  • GitHub (or its alternatives) are used to professionally manage research and software development projects
  • Most recently, public GitHub repositories were used to train GitHub Copilot, Microsoft's coding AI
  • Curious? Explore topics!

What is a repository?

  • Simple definition: “a directory with versioned files”
  • Let's refine…:
    • w/ a readme (e.g., project description, README.md)
    • w/ meta data (e.g., about, releases, stars)
  • Let's review tabs on a GitHub repos:
    • Code (we can have code on multiple 'branches')
    • Pull requests (to collaborate; merging branches)
    • Issues (think of it as to dos)
    • Projects (organize your to dos)
    • Discussions (social feature)
    • (and many more…)

Creating a repository on GitHub.com

  • Use an existing folder on computer (git init)
  • New repository (online)
  • “Fork” a repositoruy (copy online)


  • DO Today, we use “GitHub classroom” (so I can see what you're up to)

Recap

  • In essence, we use Git/GitHub for adding/changing code
  • Whenever we save code in Git's version history, we call this a “commit”
  • Everybody who has access to this Git repository will be able to roll back to this version

  • Question: Why make a difference between saving and committing?

Working with Git/GitHub

  • Git on your own computer for actual coding and file versioning

    • e.g., terminal, GitHub desktop, RStudio, etc.
    • reason: software setup will be correct, no cloud fees, etc.
  • GitHub.com for interactive web interface and project management

    • issues, project boards
    • reviewing code contributions (pull requests)
    • browsing version history, etc.
    • no code changes (unless typos)
    • no code uploads (always through Git)


We first proceed with the interactive interface on GitHub.com.

Using GitHub for collaborating in teams

  • The Scrum method is
    • an effective project management tool
    • meant to be helpful, not distracting
  • Key ingredients
    • Break down project in smaller to do's (“issues”),
    • organize them on your project board, and
    • meet to discuss progress in sprints.

Note: I will have to make you members of the organization for this to work… (let's see - otherwise we move up until after the break)

Project Management on GitHub

  • Issues: Individual to do's
  • Project board: Use to organize & prioritize individual issues

We now start with the project board. Afterwards, we talk about issues.

DO: Organize project board

Task:

  • Navigate to your repository's project page, and create a new board.
  • Create columns to organize tasks, call them
    • backlog,
    • to do,
    • in progress,
    • done.

When finished, you should have an empty board with four columns (you may want to change to card view).

Let's talk about issues ("to do's")

  • Discrete, well-defined units of work on a project
    • shouldn't be more than a few hours of work
    • shouldn't be open for forever
    • have clear deliverables, and assignees
  • Adaptivity
    • scope may grow or shrink as new questions arise
    • e.g., change title where needed
  • Prioritization on project board


View tips about writing effective issues.

DO: Let's create our first issue

  • Open a new issue

    • Title: Fix readme
    • Goal: The goal of this issue is to update the readme to conform with best practices.
    • Resources: https://tilburgsciencehub.com/write/readme/
    • Deliverables: An updated readme in a new branch + a pull request
  • Assignment

    • the issue to the “to do” column of your project board
    • make yourself “assignee”

Tip: For now, let's say you need to add the motivation of your project to the readme.

DO: Let's perform the Git Workflow (online)

Overview available at https://dprep.hannesdatta.com/docs/project/resources/cheat-sheets/

  1. Check your project board and find an issue to work on
  2. Create a development branch (online interface works best)
  3. Make necessary changes (here: online, but later on your local computers), update the Git issue throughout
  4. When done: Make pull request

I'll first demonstrate it on one of my own repositories. You can then repeat these steps on your own repository.

Summary: Managing your project and working with Git (I)

You iterate between the following stages.

1) Team meeting (“scrum meeting”)

  • go through issues on the project board: start with “done”, then “in progress”, and not started (+ why)
  • review and integrate open pull requests one by one + check whether project still runs
  • assign new issues
  • optional: write new issues & assign team members
  • (re)prioritize issues on your project board
  • set date & time for next group meeting

Summary: Managing your project and working with Git (II)

2) Individual work

  • work on your assigned issues
  • help others work on theirs, e.g., if they are stuck
  • review pull requests, to make the PR review for the team meeting more efficient
  • got some new ideas (and you will!)?
    • start writing an issue
    • put it in the backlog on the project board - it will be discussed during the next meeting


Meet again with your team.

Repeat workflow until you're done (or the deadline has passed…)

Part 2: Git (compared to GitHub)

  • For now, we only used GitHub.com
  • Ultimately, we seek to build a code base for our project
  • Likely won't be able to run R, Python etc. in the cloud
  • Need a way to get our work from GitHub.com synced with our local computers
  • For now, assume we want to work on our own computer

Git (compared to GitHub)

  • We use git on the command line (but, GUIs are available)
  • To use it on the command line, we need to know some new commands:
    • git clone repository to which you have access to (once)
    • git pull to update from the cloud (before starting to work)
    • git status, git add, git commit -m "message" (multiple times)
    • git push to push changes to cloud (after finishing work)

DO: Clone repository and verify you see the repository

1.) Copy the URL to your class room repository

2.) Open Git Bash, and type

git clone https://github.com/course-dprep/XXXXXXX

(REPLACE XXXXXXX with the name of your repository, which you find in the URL of your browser)

3.) Check whether you see your repository on your computer (e.g., Windows Explorer)


Got an error? (“Support for password authentication was removed on August 13, 2021”) → Check how to fix it here.

What we do in Git

  • Once the repos is available locally, we can “work” in it.

    • e.g., add code, run data preparation, etc.
  • While doing so, we make “commits” (we save code in our unlimited version history)

  • We can do so on different branches - each issue has its own development branch

  • You next learn how to do all of this.

DO: Let's perform the Git workflow

  1. Create a new issue on GitHub.com, and create a development branch (“change readme”)
  2. Check out the development branch locally (mostly gh pull XXXX)
  3. Verify you're on the development branch, NOT on the master/main branch (git status)
  4. Add something to the readme and save (note, not yet committed)
  5. Perform Git workflow
    • git status (view changes)
    • git add <filename> (add file(s) to “staging area”)
    • git commit -m "message" (freeze commit)
    • git push (push to GitHub; sometimes also git push -u origin branch_name)
  6. View changes on GitHub.com - do you see them?

Development branches vs. master/main branch

  • The master/main branch holds your working version of your project.
  • All “smaller” features are developed in separate development branches
    • These are not called hannes-123, but named by a feature
    • They are ideally created on GitHub.com and checked out locally
  • When your work on a branch is done, create a pull request (your team members can now integrate your changes)

Switching branches

  • When working on a project, you often have to switch branches.
  • Git commands
    • git branch -l to see which branches exist
    • git branch <new_branch> create a new branch name
    • git fetch or git fetch --all to “update” all branches from cloud
    • git checkout <branchname> or git switch <branchname> to switch (back and forth)
  • Where to see on which branch you are

DO: Switching branches

  • Create a new branch, called feature (git branch feature)
  • Switch to this new branch (git switch feature)
  • Git workflow… (nothing to do here for now)
  • Switch back to the main branch.

Reviewing and merging pull requests

When new changes become available (online) at a branch, you can integrate them using “pull requests”.

It's good practice to assign reviewers to new pull requests.

  1. Identify a pull request (PR) to review
  2. Locally check out the development branch
  3. Test whether the code runs and does not throw any errors, etc.
    • If “content” errors: add comments to the issue/PR, and request changes
    • If merging errors: fix them
    • If everything runs smoothly: say so in the issue as well, and accept the PR
  4. Click “merge” on GitHub.com (or, better, wait for the team meeting and do these things together)

Rolling back

  • When working with Git, we can always roll back to previous versions of a file
  • Each “file” is tagged with a commit hash (we can view those on GitHub.com or using git log)

  1. Identify the hash of one of your earlier commits
  2. Type git checkout <commithash> filename (will overwrite old file)
  3. Verify this is the old file!
  4. Go back to the latest version: git checkout filename

(Alternatively, use: git show <commithash>:filename > old_filename) to get the old file as a second file.)

There are many other features worthwhile to explore

  • Working on open source projects (forking & PRs) vs. being a collaborator (direct pushes)
  • Exploring version history
  • Comparing changes between files
  • Using alternative Graphical User Interfaces
    • RStudio (requires a project)
    • Visual Studio Code
    • GitHub desktop

Working with GitHub requires a new mindset

  1. There is one working version of our project – it's in the main branch (vs.: we work in one synchronized folder in the cloud)
  2. All 'development steps' are to be taken in development branches (vs.: I call my file rfile-hannes1.R)
  3. Communication is done exclusively on GitHub – namely, in issues (unless when meeting in person) (vs.: we email, call, WhatsApp, use Zoom, teams, and then forget what we talked about)
  4. We adhere to the Scrum project management system: project board, issues, prioritization
  5. GitHub is for code, not for data (so, do not commit any data files) [more later in this course]

Summary

  • GitHub vs. Git: cloud vs. local computer
    • GitHub: project management w/ board & issues, synchronization
    • Git: “clone” repository, work locally
  • How to use?
    • Always Git for working on code, never the cloud (!!!)
    • Git workflow: issue, development branch, git checkout, git pull, git status, git add, git commit, git push
  • Beware
    • you will encounter “merging issues” - completely normal; you will learn how to solve them w/ Roshini

Next steps

  • Complete the tutorial at home
  • Use the cheat sheets throughout
  • Excellent practice tool: https://learngitbranching.js.org/
    • Main: Introduction Sequence
    • Remote: Push & Pull, To Origin and Beyond
  • Coaching
    • Start working on your project (e.g., set up a repository in our course-dprep organization)

Some fun activity to make your GitHub profiles more professional (optional)

  • Create an amazing GitHub About page for your profile (important for job market!)

How to Start: