GitHub (in-class tutorial)

Hannes Datta

Structure of today's sessions

  • In-class part of GitHub tutorial
    • matters because you will manage your project on GitHub
  • Coaching session
    • Setup repository, sync to your computers
    • Know the Git workflow: branches, commits, pull requests (PRs)
    • Converge on RQ for project
  • After class
    • Please go through the entire tutorial on the dprep site.
    • Review shell/command line/terminal commands

Feel you want to be challenged more?

  • Flip through my deck
  • find the klaaropdracht (“assignment for when you're already ready with processing all of my material”) at the end (today: designing your GitHub profile page)

Introduction to GitHub

  • What is Git? (–> file versioning in “repositories”/folders, open source)
  • What is GitHub? (Git + platform for collaboration, automated testing & deployment; commercial but “free” when open source, owned by Microsoft)
  • Alternatives to GitHub (e.g., Bitbucket, GitLab; on-site installation possible, too)
  • GitHub (or its alternatives) are used to professionally manage research and software development projects
  • Most recently, public GitHub repositories were used to train GitHub Copilot, Microsoft's coding AI

DO: Explore projects on GitHub.com

  • Browse the list of project topics at https://github.com/topics
  • Click on a topic that sparks your interest, and browse available repositories.



Let's talk about it…

  • Which topic did you choose?
  • Which repositories did you discover?

Overview about a repository

Pick any repository, and describe its content.

  • Each repository has…

    • readme w/ project description (README.md)
    • meta data (e.g., about, releases, stars)
  • On a repository, we can find several tabs

    • Code (we can have code on multiple 'branches')
    • Pull requests (to collaborate; merging branches)
    • Issues (think of it as to dos)
    • Projects (organize your to dos)
    • Discussions (social feature)
    • (and many more…)

DO: Creating a repository on GitHub.com

  • Normally, we start a new repository on GitHub.com, or start from an existing folder on our computer

  • Today, we use “GitHub classroom” (so I can see what you're up to)

Recap

  • In essence, we use Git/GitHub for adding/changing code
  • Whenever we save code in Git's version history, we call this a “commit”
  • Everybody who has access to this Git repository will be able to roll back to this version

  • Question: Why make a difference between saving and committing?

Working with Git/GitHub

  • Git on your own computer for actual coding and file versioning

    • e.g., terminal, GitHub desktop, RStudio, etc.
    • reason: software setup will be correct, no cloud fees, etc.
  • GitHub.com for interactive web interface and project management

    • issues, project boards
    • reviewing code contributions (pull requests)
    • browsing version history, etc.
    • no code changes (unless typos)
    • no code uploads (always through Git)


We first proceed with the interactive interface on GitHub.com.

Using GitHub for collaborating in teams

  • The Scrum method is
    • an effective project management tool
    • meant to be helpful, not distracting
  • Key ingredients
    • Break down project in smaller to do's (“issues”),
    • organize them on your project board, and
    • meet to discuss progress in sprints.

Note: I will have to make you members of the organization for this to work…

DO: Organize project board

Task:

  • Navigate to your repository's project page, and create a new board.
  • Create columns to organize tasks, call them
    • backlog,
    • to do,
    • in progress,
    • done.

When finished, you should have an empty board with four columns (you may want to change to card view).

Let's talk about issues ("to do's")

  • Discrete, well-defined units of work on a project
    • shouldn't be more than a few hours of work
    • shouldn't be open for forever
    • have clear deliverables, and assignees
  • Adaptivity
    • scope may grow or shrink as new questions arise
    • e.g., change title where needed
  • Prioritization on project board


View tips about writing effective issues.

DO: Let's create our first issue

  • Open a new issue

    • Title: Fix readme
    • Goal: The goal of this issue is to update the readme to conform with best practices.
    • Resources: https://tilburgsciencehub.com/write/readme/
    • Deliverables: An updated readme in a new branch + a pull request
  • Assignment

    • the issue to the “to do” column of your project board
    • make yourself “assignee”

Tip: For now, let's say you need to add the motivation of your project to the readme.

DO: Let's perform the Git Workflow online

Overview available at https://dprep.hannesdatta.com/docs/project/resources/cheat-sheets/

  1. Check your project board and find an issue to work on
  2. Create a development branch (online interface works best)
  3. Make necessary changes (here: online, but later on your local computers), update the Git issue throughout
  4. When done: Make pull request

I'll first demonstrate it on one of my own repositories. You can then repeat these steps on your own repository.

Summary: Managing your project and working with Git (I)

You iterate between the following stages.

1) Team meeting (“scrum meeting”)

  • go through issues on the project board: start with “done”, then “in progress”, and not started (+ why)
  • review and integrate open pull requests one by one + check whether project still runs
  • assign new issues
  • optional: write new issues & assign team members
  • (re)prioritize issues on your project board
  • set date & time for next group meeting

Summary: Managing your project and working with Git (II)

2) Individual work

  • work on your assigned issues
  • help others work on theirs, e.g., if they are stuck
  • review pull requests, to make the PR review for the team meeting more efficient
  • got some new ideas (and you will!)?
    • start writing an issue
    • put it in the backlog on the project board - it will be discussed during the next meeting


Meet again with your team.

Repeat workflow until you're done (or the deadline has passed…)

GitHub (online interface) vs. Git

  • For now, we only used GitHub.com
  • Ultimately, we seek to build a code base for our project
  • Likely won't be able to run R, Python etc. in the cloud
  • Need a way to get our work from GitHub.com synced with our local computers
  • For now, assume we want to work on our own computer

GitHub (online interface) vs. Git

  • Let's introduce some Git vocab here
    • git clone repository to which you have access to (once)
    • git pull to update from the cloud (before starting to work)
    • git status, git add, git commit -m "message" (multiple times)
    • git push to push changes to cloud (after finishing work)

DO: Clone repository and verify you see the repository

1.) Copy the URL to your class room repository

2.) Open Git Bash, and type

git clone https://github.com/course-dprep/{in-class-tutorial-hannesdatta}

(REPLACE with your username, do not {hannesdatta}, please!)

3.) Check whether you see your repository on your computer


Got an error? (“Support for password authentication was removed on August 13, 2021”) –> Check how to fix it here.

DO: Let's perform the Git workflow

  1. Create a new issue on GitHub.com, and create a development branch (“change readme”)
  2. Check out the development branch locally (mostly gh pull XXXX)
  3. Verify you're on the development branch, NOT on the master/main branch (git status)
  4. Add something to the readme and save (note, not yet committed)
  5. Perform Git workflow
    • git status (view changes)
    • git add <filename> (add file(s) to “staging area”)
    • git commit -m "message" (freeze commit)
    • git push (push to GitHub; sometimes also git push -u origin branch_name)
  6. View changes on GitHub.com - do you see them?

DO: Rolling back

  • When working with Git, we can always roll back to previous versions of a file
  • Each “file” is tagged with a commit hash (we can view those on GitHub.com or using git log)

  1. Identify the hash of one of your earlier commits
  2. Type git checkout <commithash> filename (will overwrite old file)
  3. Verify this is the old file!
  4. Go back to the latest version: git checkout filename

(Alternatively, use: git show <commithash>:filename > old_filename) to get the old file as a second file.)

Development branches vs. master/main branch

  • The master/main branch holds your working version of your project.
  • All “smaller” features are developed in separate development branches
    • These are not called hannes-123, but named by a feature
    • They are ideally created on GitHub.com and checked out locally
  • When your work on a branch is done, create a pull request (your team members can now integrate your changes)

Reviewing and merging pull requests

It's good practice to assign reviewers to new pull requests.

  1. Identify a pull request (PR) to review
  2. Locally check out the development branch
  3. Test whether the code runs and does not throw any errors, etc.
    • If “content” errors: add comments to the issue/PR, and request changes
    • If merging errors: fix them
    • If everything runs smoothly: say so in the issue as well, and accept the PR
  4. Click “merge” on GitHub.com (or, better, wait for the team meeting and do these things together)

There are many other features worthwhile to explore

  • Working on open source projects (forking & PRs) vs. being a collaborator (direct pushes)
  • Exploring version history
  • Comparing changes between files
  • Using alternative Graphical User Interfaces
    • RStudio (requires a project)
    • Visual Studio Code
    • GitHub desktop

Working with GitHub requires a new mindset

  1. There is one working version of our project – it's in the main branch (vs.: we work in one synchronized folder in the cloud)
  2. All 'development steps' are to be taken in development branches (vs.: I call my file rfile-hannes1.R)
  3. Communication is done exclusively on GitHub – namely, in issues (unless when meeting in person) (vs.: we email, call, WhatsApp, use Zoom, teams, and then forget what we talked about)
  4. We adhere to the Scrum project management system: project board, issues, prioritization
  5. GitHub is for code, not for data (so, do not commit any data files) [more later in this course]

Next steps

  • Start with the recap exercises on the dprep site (self-guided tutorial)
  • Use the cheat sheets throughout
  • Start working on your project (e.g., set up a repository in our dprep organization)

  • Idea: Invest in your GitHub profile (design an about page)

Today's "klaaropdracht"

  • Create an amazing GitHub About page for your profile (important for job market!)

How to Start: