Idiomdrottning’s homepage

The life-changing magic of git

Repos

Sometimes you have a project or even just a single file in a folder where you want to keep track of different versions. You want to make new versions but be able to rewind time to before you made those versions, or you wanna get suggestions for new versions from other people.

A “repo” (short for “repository”) is the nickname for a folder that git can help you store different versions of. It’s not a good idea to have more than one repo per folder so make separate folders for the separate repos.

Working tree and staging area

The nickname for the current state of that folder, however chaotic or neat it might be, with all your scrap files and junk that you have in that folder, is called “the working tree”.

Git uses something called a “staging area” a.k.a. the index. Think of git as a sort of camera that can store a version of your files, and the staging area is where you select what’s gonna be there.

It’s like when I’m working on a physical project at my desk, I have all sorts of scrap paper and eraser crumbs, I don’t wanna send that to the vault. Instead, I select the core and blessed parts and just take a snapshot of those documents. The working tree is the entire desk of tea stains and textbooks, the staging area is the part that’s under the camera.

Of course, you can have several local copies of the repo when that’s convienent.

Commits

Each of these snapshots or versions are called a “commit” in git. A commit contains three things:

This is why I think git feels like a bowl of paper clips. Paper clips can be used as bookmarks, and they can also be linked together to form chains or even branching chains. You have ‘em in a bowl and some are linked up and some are loose rattling around in there.

You can write what ever you want in the commit message but git users in the English-speaking world has come together in sort of a compromise how they want them to be written with these three rules:

Here is an example:

Apply changes

Turns out this was needed after all. Because of the frobnication,
otherwise things borked royally.

Do not shoot the messenger if you think these rules are dumb. They are, and the machine isn’t enforcing them, but they’ll help you get along with other git users if you’re sending versions to each other.

Here is a page that explains the reasons for these rules.

To make a commit is also called to commit. So you commit a commit. Language is dumb. Committing means copying what’s in the staging area into a specific commit.

The names for the commits are autogenerated from what’s in them. For example, the first two commits in the repo for this text got named b7efb1f3aae37b43e7764e8af97302e3853429ea and 4b18c5942c48e252200fab0e2466a94777984993. The data git uses to keep track of them is stored in .git/objects and these are their filenames.

Git knows what you mean if you just use the first few letters in one of these commit names. For the first several years of using git, I didn’t know that, so I would copy and paste the entire long name. But that’s not necessary, git knows what’s up.

Changes vs versions

I use the word “copy” a lot (copying from the working tree into the staging area, and from the staging area into a specific commit, but don’t worry, git knows how to store these copies efficiently without waste, especially for text files.

Magit and other tools lets you view what’s different between the commits, what’s changed, and that’s great, so it’s easy to get the misconception that git commits are just a set of changes (people often refer to a set of changes as a “patch”), but don’t get the wrong idea. A commit is not a patch.

It’s better to think of git as storing specific versions, that a commit contains a version (and a message and a pointer to the previous commit), than to think of it as storing changes. This confused me at first, since I used patch-based systems (like darcs) before git. Those systems had a mental model of storing changes but in git it’s easier: A commit is a version of the files, and a link to a previous commit.

I keep repeating that “commits contain a link to the previous commit” bit so you can probably tell that that’s important.

It’s not just a linear timeline. There can exist more than one commit succeeding the same predecessor. Just like paperclip chains. Going backwards from a particular commit, that’s linear. It points to it’s predecessor which points to its predecessor and so on, all the way back to init.

Names for commits

Those unreadable jumble-of-letters-and-numbers names are not random. They’re autogenerated from the contents of the commit. If you wanted to have a similar commit but based on a different predecessor, it’ll get a different letters-and-numbers name. These names are each called a “hash”.

There are three other ways to refer to a commit.

Tags

If there is a commit that you think is especially important, like “This is version 1 of my awesome project”, you can create a tag to refer to that particular commit.

Use a tag when you want it to point to the same commit forever and ever.

Branches

A branch is a movable name or label. I thought branches were confusing at first until I realized that they were nothing more than names for commits. So when I say “a branch”, I just mean this type of a name.

For example, the repo I’m using for this text has a branch “main”. It is just a name, “main”, that git knows is currently pointing to 4b18 (at the time that I’m writing this sentence). But I can change it to point to another commit if I want to, unlike tags.

Git will even move that name for me automatically if I make a new commit that points to 4b18. I’ll go ahead and do that now. OK, I’m back. I made a new commit, it got named a052, and “main” automatically got moved to point to it.

So if your mind was spinning when you found out that commits can have more than one successor, now you know that branches can be used to keep track of that.

Git uses this metaphor of “checking out” a commit, not like checking out hot guys but more like when you get back your coat after checking it in at a nightclub.

Checking out a commit will overwrite those particular files with the versions of the files that are in that commit, but will leave your untracked scrap files alone.

The name “HEAD” refers to the current commit. The commit that you most recently checked out or made (which ever happened last). You can also use HEAD^ to refer to the commit before that, or HEAD^^ to mean the commit two steps before HEAD and so on.

Checking out

So checking out just means “go to a particular repo”.

Using a branch name when checking out makes that branch keep tracking (i.e. getting updated to point to) new commits as you make them.

Checking out a commit via the hash (the jumbled-letters commit name) can lead to what’s called a “detached head”.

That’s not dangerous, it just means that the head commit doesn’t have a branch name attached to it.

If you hate the head commit and want to throw those changes away and go back to a branch, just check out that branch. For example, if your branch is “main”, then type git checkout main.

If you love the head commit, or commits plural, maybe you’ve made several commits but then you realize that they haven’t been tracked by the branch you want, you can set that branch to point to the head commit by typing git checkout -B main HEAD at the command line.

Loose commits

If, through shenanigans and bad choices, some of those paperclips in the bowl have become loose, you can find them by typing git reflog at the command line. Then you can check them out or cherry pick from them.

Forks

A whole separate copy of the repo in another folder, that’s called a “fork”. On your own computer you can make forks by just copying the repo folder in a file manager. Each fork (copy) is its own repo.

History rewriting

While git does have a lot of tools to rewrite the commit history, to reorder commits, to change the wordings of messages or even in the files retroactively—say you’ve made a typo early on, and then made other commits—you can then retroactively make it so that that typo never happened. Git, the ultimate time machine.

However, you’re not supposed to rewrite history that you’ve published, that someone else has accessed and forked or fetched from. That can lead to trouble. Even a simple amend (as seen below) counts as troublesome rewriting if done on a published repo. But on your own private repos, especially repos that you are gonna publish or send to other people later, history rewriting can be awesome because what you send will be neater.

Remember that the hash names are autogenerated from the commit, including the contents of the version, the previous commit, and the message, so if any of those change, so will the hash.

Rewriting the most recent commit is the easiest. It’s as simple as adding the --amend flag to git commit from the command line, or from the magit-status view, where you’d normally hit cc for a vanilla commit, you use ca.

There are two ways to rewrite further back. One is to just rewind your commits with git reset --soft or git reset --mixed and then just make new commits with the code that’s in your working tree. Resetting --hard is not good for this particular purpose (it’s actually super dangerous and you can lose work) since that would throw away the changes in the working tree, and checking out an older version would have the same problem.

The other way is to use rebase. If you’re using the command line tools, start by typing git rebase -i followed by oldest commit involved, which can be a hash or something like HEAD^^

That’ll open a list in your editor. It’s a magical text file, even if you’re using vanilla Notepad, because here, you can reorder those commits just by changing the order of the lines. Unlike git log, this list goes oldest on top, newest at the bottom.

Removing a line means that the commit will be lost (pretty dangerous).

Also, you see that there’s the word “pick” by each commit. Pick means include it. You can change “pick” to other commands: edit means include but git will give you a chance to make changes to that commit and squash means to join that commit into the previous commit (and you can have several squash commits in a row if you want to join a bunch at once).

So again, just to be super clear: if you’ve made a bunch of changes and you’re happy with what you’ve got, you just wish your history looked a little neater and you decided to do a rebase to join some of those commits together, use squash instad of delete the lines.

As an example of that, if you have commits A (oldest) then B then C (newest), but you think B is a little bit embarrassing because C got you where you wanted: then mark C as squash (to join it and B together) instead of just removing B, which might bork up C.

Once you save and close that file, follow the instructions git is giving you on the command line.

You can start a similar -i style rebase from the magit-status view by pressing ri down on the oldest commit that you do want to include (unlike when you call git rebase -i from the command line, where you say the oldest commit younwant to leave in place). Magit also has a lot of other, more convenient history rewriting tools. For example, you can commit new changes in your working tree straight into an older commit by committing with cF instead of cc.

So, just as a li’l reminder: do not rewrite repos you’ve already published, and be aware that the hashes are gonna change.

Getting new versions from other people

This is pretty clutch when collaborating.

There are three ways you can get changes from other people. One is if they send you patches and you just apply the patches. Git even has a tool that helps sending patches this way.

This is one time you’re actually working with change sets (a.k.a. patches) and not just working with different versions.

The other two ways both involve using git itself to fetching from another fork.

Those two ways are called merging and rebasing.

I’ve worked on one project where the policy was to always use rebasing, and I’ve worked on other projects where the policy was to always use merging; while I think merging is more fun, ultimately you’ve got to get along with your fellow devs so I’ll try to explain both.

git fetch just grabs the other commits and puts them in your “paperclip bowl”. See more about fetching in the Pushing and fetching section below.

git pull is a shorthand for git fetch; git merge, while git pull --rebase is shorthand for git fetch; git rebase.

If you haven’t made any local changes, you’re just behind, and you just want to catch up, then merging and rebasing is the same (as long as merging has the “fast forward” option turned on, which is the default).

Merging and rebasing are different when you have made some changes and your friend have made other changes and you now wanna integrate them. You might need to have to resolve some differences by hand by editing the files, that problem is the same whether you use rebase or merge.

Merging creates a new commit that actually has two ancestors. Fun fun fun.♥︎

Rebasing rewrites your history as if you had made your commits after the other person’s commits, or the other way around. As with all history rewriting (see above), this changes the hashes, and also you’re not supposed to do any rewriting on publically accessible repos, repos that you’ve published or pushed.

Rebasing is popular because it makes your changes all sit on top of theirs, which then makes it easier for them to add them back.

For your own local copy that you’re developing on, if you’re contributing to another project, rebasing is the best. It just is. That’ll make sure the commit history that you’re gonna send (whether as patches or as pull requests) is gonna be as clean as it can be, and it’s easy for you to keep up with upstream changes.

Then on the main “central” repo (not that git has a “central” repo technically, just that many projects have one, culturally), that’s where merging can be a fun option for the maintainers so they can preserve history. Don’t rebase the central repo on top of the submitted changes, that’s backwards, the central repo’s history should not be changed.

Publishing a repo

Your friends just need access to the folder somehow. If you have access to web hosting, I’ve already written a separate post on how to serve up git repos there.

Pushing and fetching

Both fetching and pushing involve remotes—a remote is just a name for a fork that git knows about. A remote is set automatically when you clone a remote repo, but you can add more remotes by using git remote add from the command line, or by calling magit-remote from Emacs..

Remotes doesn’t have to be over the internet, they can be on your own hard drive too.

When working with remotes, I sometimes struggle to remember to prefix the remote branches with the remote name. For example, if I have a remote named origin, and I want to refer to that remote’s “main” branch, that branch is origin/main while my local main is just main.

You can only push to remotes that you have write access to. If you don’t, you need to ask the other person to pull from you or you can send patches to them.