Source control is an incredible technology. It’s like having an undo/redo stack, but for an entire project. It allows you to sensibly branch/merge code, track changes you’ve made, and refer to old versions of code. It might be the single most important and useful development tool in a developer’s toolbox.
One such tool, git
, has become the de facto standard for source control among developers. Other solutions such as svn
(subversion) and hg
(mercurial) have use in the industry, but their market share pales in comparison to the sheer volume of developers using and familiar with git for their projects.
When I was first introduced to git, I was coming from the experience of more “traditional” source control solutions such as Perforce and Team Foundation. Learning git was a daunting undertaking, and I can understand the hesitancy some may have to even attempt it since there is a steep learning curve. In this series, I’m going to introduce git concepts and provide explanation of some common git workflows that developers might need in their day-to-day work.
“Traditional” vs. Decentralized Source Control
The first thing we’re going to cover is what makes git special when compared with other source control solutions.
Long ago, when tape storage was how software was persisted long-term, offices working on software products had a master copy of a tape that contained the primary version of software. Checking out this code was the act of borrowing a physical tape, making changes, and returning the tape. Traditional source control systems attempt to emulate this behavior. You get a complete copy of a project’s code on your local system, you check out any files that need edits, and you check them in when you’re done. There is a master copy on a remote server somewhere that contains the latest and greatest version of the software. Any branches are a deep clone of all files.
git isn’t like the aforementioned “traditional” source control solutions, in that it is decentralized. There is no primary server that operates as a single source of truth for the repository. Each copy of the repository can serve as a valid source with which you can exchange files. Checking files out doesn’t lock them on a remote server; any changes made to files are merged when you need to merge different segments of code back together. A repository is a folder of metadata describing the change operations that have been performed on a set of files. This can be copy/pasted anywhere, and all of those copies represent a valid “master” repository.
git also has a huge benefit over the traditional model in that branches are lightweight. Because each commit in a repository represents only a difference from the previous commit in history, it is very easy to create a branch and manage divergent changes in source code. This makes managing releases and keeping source history for prior versions trivial, whereas with traditional source control systems, each branch is a complete deep copy of all files and folders being managed.
git Mechanics
One of the things I struggled most with when starting out with git was understanding exactly what was happening when I made commits in git source control. The concepts of sequential commits, lightweight branches, and how a commit impacts a branch’s history take a bit to get used to.
In git, a single commit represents only the differences between the current version of files and their previous versions. Each commit in a chain of commits represents the state of the repository at that point in history. A branch represents an independent set of changes to the repository that might be different from changes happening on the master. When you want to save your changes, you are taking a snapshot of the state of the folder under source control at that point in time - this is what is known as a “commit”.
A series of commits is known as “history”. This history can be branched; the primary history of a repository is a branch known as the “master” or “main” branch.
Tooling
This series will cover using git from the terminal, but it is admittedly much easier to understand git concepts when you use a visualization. The default git gui
is pretty bad, but there are a couple of guis I have used in the past that make visualizing complex commit history much easier.
- Atlassian SourceTree
- GitKraken
Code editor integration can also provide a nice visualization, especially for merging changes. VS Code has a pretty good overall git integration. Visual Studio’s integration leaves a bit to be desired, but is good enough for basic workflows. JetBrains products also have an integration but I don’t find them particularly helpful.
Terminology
I’ll end part one by defining some git terms. These terms will be used heavily throughout the series, so feel free to bookmark this list in case you need to refer back here.
- repository :: a set of files and folders containing source code for a project.
- commit (noun) :: a set of file differences between versions of a repository identified by a hash.
- commit (verb) :: the act of storing a set of file differences associated with a hash in a repository.
- hash :: an identifier that uniquely marks a single commit.
- history :: an ordered set of commits in a repository.
- branch :: one of many diverging sets of commits in a repository.
- pull (verb) :: the act of copying commit history from a remote repository to your local repository.
- remote (or upstream) :: a copy of a repository that exists in a different location than your local repository.
- push (verb) :: the act of copying commit history from a local repository to a remote repository.
- merge (verb) :: the act of reconciling differences from one branch into another branch, and joining their history together.