Git and GitHub

What is Git?

Version Control Systems

Version control is a system that records changes to a file or a set of files over time, so that ou can recall specific versions later.

There are three main types of version control system:

Local version control systems: maintain a local database, or even make copies of the files with timestamp to know which version is earlier or later.
Centralized version control systems: a single, centralized version database that contain all versioned files. A number of clients check out files from that central place.
Distributed version control systems: each client fully mirror the repository, including its full history. Thus, if any server dies, the systems that were collaborating can restore the content. Git is a type of distributed version control system.

History of Git

In 2002, the Linux kernel project began to use a proprietary DVCS called BitKeeper. However, the relationship between the community and the commercial company that developed BitKeeper broke down, and the tool's free-of-charge status was revoked. Thus, the Linux community developed their own tool based on some of the lessons learned while using BitKeeper.

Goals of Git:

Speed
Simple Design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like Linux kernel efficiently (speed and data size)

Core concepts of Git

Snapshots, not differences

Git thinks of its data as a series of snapshots of a miniature filesystem. Basically, every time we commit, Git takes a picture of what all the files look like at that moment and stores a reference to that snapshot.

Thus, the data of Git is like a stream of snapshots.

Nearly every operation is local

Most operations in Git need only local files and resources to operate. Because we have the entire history of the project on the local disk, most operations seem almost instantaneous.

Git has integrity

Everything in Git is checksummed before it is stored and then referred to by that checksum. That means it is impossible to change the contents of any file or directory without Git knowing about it.

The mechanism that Git uses for checksumming is called SHA-1, which is a 40-character string composed of hexadecimal characters.

Git generally only adds data

Most actions with Git tend to add data to the Git database. Thus, it is hard to get the system to do anything that is not undoable or to make it erase data in any way (but not impossible).

The Three States

git-three-state

Files managed by Git can reside in three states:

Modified: File has been changed, but not commited to the database
Staged: Modified files have been marked in their current version to go into the next commit snapshot.
Committed: Data is safely stored in the local database.

Basic Usage of Git

Configuring Git

Git comes with a tool called git config for getting and setting configuration variables that control all aspects of how Git looks and operates:

[path]/etc/gitconfig file: contain values applied to every use on the system and all of their repositories. Accessed by git config --system
~/.gitconfig or ~/.config/git/config files: contain values specific to the user and apply to all repositories of this user. Accessed by git config --global
config file in the Git repository of the current repository: specific to that single repository.

Setting identity

Setting identity is the first thing that we should do when installing Git. It is because every Git commit uses this information, and its immutably baked into the commits you create.

git config --global user.name "User Name"
git config --global user.email user.name@example.com

Getting a Git Repository

Typically, there are two ways to obtain a Git repository:

Take a local directory that is currently not under version control, and turn it into a Git repository
Clone an existing Git repository from elsewhere.

Initializing a repository

cd <director>
git init

This commands create a new subdirectory named .git that contains all necessary repository files.

Cloning an existing repository

git clone <url> <directory-name>

Git receives a full copy of nearly all data that the server has. Every version of every file for the history of the project is pulled down by default when running the command.

Recording changes to the repository

Lifecycle of the status of files in a repository

git-file-status-lifecycle

Tracked files are the ones that Git knows about, because they were in the last snapshot. They can be modified, unmodified, or staged.

Untracked files are anything else.

Checking status of the files

git status

Tracking new files

git add <file>

Staging Modified Files

The git add is also used to add modified files to the staging for the next commit.

Think of git add as adding a particular snapshot of a file to the commit history instead of adding a file.

Ignoring Files

Use the .gitignore file to store class of files that we don't want Git to automatically add or even show as being untracked.

Rules for the pattern of .gitignore:

Blank lines of lines starting with # are ignored
Standard glob pattern works, and will be applied recursively throughout the entire working tree
You can start the patterns with a forward slash (/) to avoid recursivity
You can end the patterns with a forward slash (/) to specify a directory
You can negate a pattern by starting it with an exclamation point (!)

Viewing staged and unstaged changes

git diff # Show changes that have not been staged
git diff --staged # Show changes between the staged and the previous head

Committing your changes

git commit -m "message"
git commit -a -m "message" # Automatically add all tracked files and commit

Removing files

To remove a file from Git, you have to remove it from your tracked files and the commit.

git rm <file> # untrack and also removes the file from the working directory so that you don't see it as an untracked file the next time.

Viewing the commit history

Viewing the history

git log # Lists the commits made in that repository in reverse chronological order
git log -p -2 # Show the difference introduced in each commit for the last two commits
git log --stat # Show some abbreviated stats for each commit

Undoing things

When you mess up the last commit

For example, you commit too early and forget to add some files or mess up the commit message. Use git commit --amend for this purpose. In the end, there will be only one commit in the history.

git commit -m "initial commit"
git add forgotten_files
git commit --amend

Unstaging a staged file

git reset HEAD <file>

Unmodifying a modified file

Unmodifying a file means reverting it back to what it looked like in the last commit.

git checkout -- <file>

In general, git checkout completely replace the local file, and thus is a dangeous command. Try to use branches and stashes instead.

Working with Remotes

Remote repositories are versions of your projects that are hosted on the Internet of network somewhere. They can be read-only, or read/write for you.

Managing remote repositories includes knowing how to add remote repositories, remove remotes that are no longer valid, manage various remote branches, and define them as being tracked or not, and more.

Showing remotes

git remote # Show shortnames of each remote handle specified, such as origin
git remote -v # Show URL that Git has stored for the shortname to be used when reading and writing t the remote
git remote show <remote> # Show details

Adding remote repositories

git remote add <shortname> <url>
git remote rename <old> <new>
git remote remove <shortname>

Fetching and Pulling from remotes

The fetch command pulls down all the data from the remote project that we don't have. After doing this, we will have references to all the branches from the remote. Noted that the git fetch command only downloads the data to local repository. It does not automatically merge with or modify any local work.

git fetch <remote> # Pull all data from the remote that we don't have yet.

The git pull command fetch a remote branch and then merge that remote branch into your current branch. By default, the git clone setup local master branch to track the remote master branch.

Pushing to remotes

git push <remote> <branch>

This command works only if you have write access and nobody has pushed in the meantime. If someone else pushed upstream, and then you push, your push will be rejected. You will have to fetch their work first, incorporate it into yours before you will be allowed to push.

Git Branching

Concepts of branching

Branching means you diverge from the main line of development and continue to do work without messing with that main line.

Because Git branches are incredibly lightweight, Git encourages workflows that branch and merge often, even multiple times in a day.

How Git branching works

Recall that Git stores data as a series of snapshots. Each snapshot has pointers to the commits that directly came before it:

Zero parent for the initial commit
One parent for usual commit
Multiple parents for a commit that results from a merge of two or more branches.

A branch in Git is simply a lightweight moveable pointer to one of these commits.

git-branch-model

When you create a new branch, you create a new pointer that you can move around.

Git keeps a special pointer called HEAD, which points to the local branch that you are.

git-branch-model-2

Using Git branching

Create and switch to new branch

git branch iss53 # Create branch iss53
git checkout iss53 # Switch to the branch iss53
# Short hand: git checkout -b iss53

Stashing and Switching to other branch

If you working directory or staging area has uncommited changes that conflict with the branch you are checking out, Git won't let you switch branches. While it is better to have a clean working state when you switch branches, an approach is relying on stashing and clearning

TBA

Merging

git-merge-model

# Suppose that we want to merge hotfix back to master
git checkout master
git merge hotfix

Merge can be fast-forward when we try to merge one commit with a commit that can breached by following the first commit's history.

If the development history has diverged from some older point, Git does a simple three-way merge, using two snapshots pointed by to by the branch tips and the common ancestor of the two.

git-merge-model-2

Dealing with remote branches

Remote references are pointers in your remote repositories, including branches, tags, and so on.

Remote tracking branches are references to the state of remote branches. They are local references that you can't move. Git moves them for you whenever you do any network communication to make sure they accurately represent the state of the remote repository.

Remote-tracking branch names take the form <remote>/<branch>. For instance, git checkout origin/master to see the master branch on the origin remote since the last time we communicate with it.

git-remote-branch-model

Rebasing

TBA

Netlify CMS

Technology Lists