Git
I did a Q&A with ChatGPT about Git’s internals. Based on the conversation, I wrote down statements and let ChatGPT comment on them. I processed these comments and this is the result:
- The basic elements are blob, tree, commit and tag.
- A tree file references other tree files and blob files
- A commit file references a tree file that is a root. A root tree has no parent tree.
- An annotated tag file references a tag file in
.git/objects. This object references a commit file. - A lightweight tag file directly references a commit file.
- Tag files live in refs/tags. For annotated tag files, this file is just a reference.
- Any file in references/heads references a commit file.
- Any file in references/remotes/.. references a commit file.
- HEAD references a file in references/heads, thereby pointing to a branch.
- A branch is a series of one or more commit files of which the last one is referenced by a file in references/heads.
- Strictly speaking, a branch is just a reference to a ‘last’ commit.
- The references/remotes/ directory can contain multiple directories, one for each ‘remote’.
- Each directory in references/remotes/ can contain a multiple directory trees, whereby each directory not containing a directory contains a reference to a commit file.
- The index file contains the paths of all tracked files, including their last modified timestamp.
- Upon staging using ‘add –all’, each source file who’s timestamp/filesize does not match the ‘index’ timestamp/filesize is ‘blobbed’ and its line in index is updated with new hash, new filesize and new timestamp. This is also done for completely new files.
- Only if the combination of timestamp/filesize does not tell without whether the file has been changed, the hash of it is calculated. This will leave no doubt.
- Upon committing, Git builds the required new tree objects and a new commit object, based on the contents of index. The new commit file references the new root tree.
- Going back in time means creating a detached HEAD, meaning HEAD does not point to a branch in refs/heads anymore
- Instead, HEAD will now directly reference a historic commit
- If you go back in time, it is best to create a branch from that point. Branching ends the ‘headless’ state. If you don’t create the branch, new commits will not result in a file in refs/heads and might become inaccessible.
- Headless state is okay if you just want to look at old code, not if a previous commit is a new starting point for development.
- Upon merge, a commit will have two instead of one parent, unless it is a fast-forward merge.
- A fast-forward merge arises when the main branch has not gotten new commits after the creation of the new branch. The merge then simple means adding the new branch commits to the main branch.
- Upon merge, the merged branch will not be deleted automatically. You can decide to do it yourself.
- Upon fetch, Git downloads all missing objects and updates refs/remotes/origin/main, which might point to a newer commit.
- The ‘pull’ command is actually the sequence of ‘fetch’ followed by either ‘merge’ or ‘rebase’.
- If upon pull the histories of ‘main’ and ‘origin/main’ have not diverged (which happens when only one of them has progressed), there will be a ‘fast-forward merge’.
- In a fast-forward merge, the new commits of the one version will be added to the other version, no new commit is created. Git simply moves the branch reference (refs/heads/main) forward to the newer commit.
- If the two histories have diverged (you and someone else both worked on the same branch, effectively splitting it) a merge commit is required.
- A merge commit is a new commit. Conflicts must be resolved before Git can create it.
- A merge commit has two parents. That is the only thing that makes it special.
- If your local Git has created such a merge commit and you push it, this merge commit will be reality for every remote.
- A merge commit preserves branches, while a ‘rebase’ removes one branch and places it after the other.
- To do so, it creates entirely new commits based on the commits of the ‘rebased’ branch.
- These new commits necessarily point to another parent, and they might need a new root tree.
- Because of the altered metadata they have different content, and therefore must have a different hash.
- What is preserved in these new commits is the effect of each commit on the final code result.
- Rebasing results in a linear commit history which might be a simpler, and therefore expedient outcome.
- Rule of thumb: Never rebase commits that other people have already based work on.