Learning Git and GitHub

Acknowledgement

This slide deck has been very lightly adapted from one written by Dr. Ted Laderas at the Fred Hutch Data Science Lab.

Introduce Yourself

Your Name/Major/Location(?)/Pronouns
What’s one interesting thing you did over the summer?
What do you hope to get out of the BIG Problems course?

Who is this introduction for?

Hopefully everyone here, but especially

Those who want a basic understanding of the version control process
Those who want to understand how open source collaboration works
Those who need to build a mental model of Git

Not for:

People who already use Git on the command line (you already have the mental model)
Impatient people

Learning Objectives

By the end of this discussion, you will be able to:

Explain the reasons we use repositories
Explain how version control tracks changes in your work
Make a branch in a repository
Make contributions to a repository using pull requests

Reminder: Be gentle with yourself and others

We are all learning together
We all learn at different paces
Asking questions is a way of taking care of others

Our focus today is on concepts

You’ll learn more of the details as you work with git/GitHub
…there’s just no substitute to actually using it…

Reproducibility and the Research Lifecycle

Benefits of Storing your code in a Repo

Centralized code
Other people outside your group/lab can use it
The ability to roll back changes that broke your code
Recognition for your work
Supportive community that can help you learn and improve it

It’s Tough being Open

But it’s also rewarding

Version Control

Version control is a systematic approach to record changes made in a file, or set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions later when needed.

Ways we work with version control

By ourselves (sole developer)
As a member of a GitHub repository
As an external collaborator of a GitHub repository

Version Control Workflow (by ourselves)

Create files - these may contain text, code or both.
Work on these files, by changing, deleting or adding new content.
Create a snapshot of the file status (also known as version) at this time.
Document what was changed in the version history of that file.

You probably already do a version of this:

Git is a formal way of tracking changes

Each “save” is called a commit
Basically a snapshot of the file at that moment in time
We have one file, but many versions of that file
We only track changes in the file, not save the entire file

What’s the diff-erence?

Git only tracks what’s changed between commits (called a diff):

Lines of code we add (+)
Lines of code we delete (-)

Diff Example

We can fix mistakes

What if we made a mistake in code?
We can roll back or revert changes associated with a commit

Exercise: Look at a commit history

Exploring Github Exercise

More about commits

Commits have a message
Commits can be done for multiple files at once

There is an intermediate step: staging

Exists to bundle multiple changes to a single commit
Is hidden in the web interface

Ways we work in a repository

By ourselves (sole developer)
As a member of a GitHub repository
As an external collaborator of a GitHub repository

Git / GitHub is a way for multiple developers to work together

Everything we’ve done so far we’ve done by ourselves
The key with Git/GitHub is that multiple people can work on a repository at once

What is the difference between Git / GitHub?

Git is the software that does version control
- Use it on our own machines with command line git, Git Desktop, RStudio, etc.
GitHub is the website that hosts repositories and it uses Git
- Hosted repository is also called the remote

Interacting with GitHub

graph LR
  A[Our local machine] --git push ---> B
  B["Remote Repo on GitHub"] --git pull--> A

Updating the remote from our local is called a push
Updating the local from the remote is called a pull

How do we do this?

Multiple people can work on their own versions of the code called branches
Developers can work on different features on the same code
Needs a reconciliation process (pull requests/merging)

Branches are isolated versions of the original repository

Many People Can Work on the Same Code

Working on different branches

When in doubt, branch

Before you make big changes, make a branch

Exercise: Add a Recipe to our Cookbook Repository!

Everyone is a member of this repository:

cookbook_BIG_problems

Exercise: Add a Recipe

Make a Pull Request

A pull request is a formal request to merge your code changes into the history
Someone (the owner) needs to merge your changes after a request

Exercise: Make a Pull Request

Now comes the hard part

Integrating the changes from multiple branches

Reconciliation of Branches (merging)

Need to integrate changes in branches together
This is called a merge

Who’s responsible for merging?

Repository Owner
- Could be program manager of a group
- Could be a software engineer
It is a big responsibility
- Need to ensure that merge doesn’t break things
  - Use automated testing to ensure changes don’t break code
- Need to make sure that merges don’t conflict
  - Reconciling changes so that they work together

Branching/Merging works because of communication

You need to communicate with other developers which part of the code you’re working on
Partition out the tasks
Multiple developers working on the same part of the code can be difficult

Merging process

Manual review process
- May start a conversation about the pull request
- May submit reviews
- May submit approvals
- May deny the pull request

Merging Demo

Ways we work in a repository

By ourselves (sole developer)
As a member of a GitHub repository
As an external collaborator of a GitHub repository

GitHub lets you contribute to code, even if you aren’t a member

You can still contribute to code you don’t own
- Open source is built on collaborations
You can do this by making a fork of the code

Forks

Your version of the code is called a fork - it belongs to you
- Like an external branch
- The fork belongs to you
Can submit your changes to the code through a pull request to the original repository:

Recap of What we did

Hopefully you will now be able to

Explain the reasons we use repositories
Explain how version control tracks changes in your work
Make a branch in a repository
Make contributions to a repository using pull requests
Make a fork of a repository that you don’t own

Learning Git and GitHub

Acknowledgement

Introduce Yourself

Who is this introduction for?

Learning Objectives

Reminder: Be gentle with yourself and others

Our focus today is on concepts

Reproducibility and the Research Lifecycle

Benefits of Storing your code in a Repo

It’s Tough being Open

But it’s also rewarding

Version Control

Ways we work with version control

Version Control Workflow (by ourselves)

You probably already do a version of this:

Git is a formal way of tracking changes

What’s the diff-erence?

Diff Example

We can fix mistakes

Exercise: Look at a commit history

More about commits

There is an intermediate step: staging

Ways we work in a repository

Git / GitHub is a way for multiple developers to work together

What is the difference between Git / GitHub?

Interacting with GitHub

How do we do this?

Branches are isolated versions of the original repository

Many People Can Work on the Same Code

When in doubt, branch

Exercise: Add a Recipe to our Cookbook Repository!

Make a Pull Request

Exercise: Make a Pull Request

Now comes the hard part

Reconciliation of Branches (merging)

Who’s responsible for merging?

Branching/Merging works because of communication

Merging process

Merging Demo

Ways we work in a repository

GitHub lets you contribute to code, even if you aren’t a member

Forks

Recap of What we did

Resources