Git submodules allow you to include one repository inside another. This is useful when you want fine-grained control over your dependencies, or in situations where a dependency manager is not suitable. Submodules are powerful tools, and it’s worth understanding them properly before using them.
In this article, we’ll cover:
- What Git submodules are
- Common workflows with submodules
- What they are useful for
- When you shouldn’t use them
At the end of the article, you’ll also find links to further resources.
Imagine you are working on a text editor. You’ve implemented the basic features of viewing and editing files, and now you want to add syntax highlighting. There’s a cool library on GitHub that does exactly what you want, but it hasn’t been published to the dependency manager you use. How can you use it?
This is a situation where Git submodules might come in handy. Submodules are a feature of Git that lets you include one repository inside another. This means that you can include the syntax highlighting library in your text editor’s repo while keeping a link to the original repository so that you can receive upstream changes.
The above diagram shows what your repository structure might look like when you use submodules. The
test directories contain your own files, but
lib contains the syntax highlighter library as a submodule.
Submodules are entire Git repositories that are pinned to a specific commit. Your local copy of a repository containing a submodule will contain all of the files from the submodule, which means that you can treat it as if it were your own code. Submodules let you view, edit, and reference all of the files in the contained repository.
text-editor ├── lib │ └── syntax-highlighter │ ├── README.md │ ├── docs │ │ └── very-good-docs.md │ └── ... ├── src │ ├── editor.py │ └── ... └── test └── ...
Above is the file structure of our text editor repository after adding the syntax highlighting library as a submodule. All of the files from the submodule are on our filesystem and ready for us to edit them.
Now that you’ve seen what submodules can do, the following section will take you through how to use them.
Adding a submodule to your repository
Following on from the example earlier of a syntax highlighting library, imagine that the library you want to add is at the following URL:
You can add this library as a submodule to your repository by using the command:
git submodule add https://www.github.com/username/syntax-highlighter lib/syntax-highlighter
This will add two new files to your repository,
lib/syntax-highlighter. You can see these files using
~/src/text-editor$ git status On branch main No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .gitmodules new file: lib/syntax-highlighter
.gitmodules is a simple text file that lists the submodules in your repository. You should commit this file so that other people working on your repository can also use the submodule.
lib/syntax-highlighter is a bit more complicated. Git sees this path as a file, but your filesystem sees the path as a directory. You can output what Git sees by running
git diff --cached lib/syntax-highlighter:
~/src/text-editor$ git diff --cached lib/syntax-highlighter diff --git a/lib/syntax-highlighter b/lib/syntax-highlighter new file mode 160000 index 0000000..ac8e080 --- /dev/null +++ b/lib/syntax-highlighter @@ -0,0 +1 @@ +Subproject commit ac8e080ae2ba4c582eb5842139ab7e5082b4cff0
As shown in the diff above, Git sees the submodule as a file containing the commit ID currently tracked by the submodule. By default, this will be the latest commit to the default branch, which is usually
main on newer Git repositories and
master on older ones.
However, if you look at the submodule on the filesystem using something like
ls, you’ll see that it’s a directory:
~/src/example$ ls lib/syntax-highlighter README.md docs src test
What’s more, this directory is actually a Git repository in its own right! You can run things like
git status and even edit the code in it.
Cloning a repository that contains a submodule
Git stores each submodule as an entry in
.gitmodules and a file in the repo that describes what commit the submodule points to. As a result, when you clone a repo, you need to do a little extra work to download the code for the submodule into your local copy.
Let’s say you’ve cloned the text-editor repo from earlier:
git clone https://github.com/username/text-editor
If you were then to examine
lib/syntax-highlighter, you’d find that it’s just an empty directory.
~/src/text-editor$ less lib/syntax-highlighter/ lib/syntax-highlighter/ is a directory
lib/syntax-highlighter with the submodule’s code, you need to run
git submodule update --init --recursive.
~/src/text-editor$ git submodule update --init --recursive Submodule 'lib/syntax-highlighter' (https://github.com/username/syntax-highlighter.git) registered for path 'lib/syntax-highlighter' Cloning into '/home/username/src/text-editor/lib/syntax-highlighter'... Submodule path 'lib/syntax-highlighter': checked out '55086f1cb2ee8294d3354805be941171c287557d'
This is a convenient shorthand for
git submodule init followed by
git submodule update. If your submodules have submodules then this command will also initialize those recursively.
init figures out where the submodule comes from and
update downloads its contents.
An alternative workflow is to use
git clone --recurse-submodules. This is an even shorter shorthand that is equivalent to a
git submodule init, and
git submodule update.
Editing a submodule’s code
Submodules are complete Git repositories in their own right. This means that you can use them exactly as you would any other Git repository. To illustrate this point, let’s walk through making a change to a submodule in our repository.
Imagine that you want to add a line to the syntax-highlighter library to let it support Python. We can make that change in our favorite text editor (possibly the one we’re building!) and then see the change with
Note the path in the terminal prompt:
~src/text-editor/lib/syntax-highlighter. We are making this change inside the submodule, not inside the original syntax-highlighter repository.
After making the change, we can do our usual
git commit, and voila! We have edited our submodule. You can see this change in the text-editor repository by running
git diff lib/syntax-highlighter:
~/src/text-editor$ git diff lib/syntax-highlighter/ diff --git a/lib/syntax-highlighter b/lib/syntax-highlighter index 8b6e157..55086f1 160000 --- a/lib/syntax-highlighter +++ b/lib/syntax-highlighter @@ -1 +1 @@ -Subproject commit 8b6e157f0fb785c619b99373bb474e03b1b72f54 +Subproject commit 55086f1cb2ee8294d3354805be941171c287557d
Note that this diff just updates the commit ID that the submodule refers to. The actual changes to the submodule are not recorded in the parent repository. This leads to a really important point: to make changes to a submodule, you need push access to the original repository. Otherwise, the changes would be reflected in your local copy of the submodule, but nowhere else.
If you didn’t create the submodule, and therefore don’t have push access, that’s ok! You just need to fork the original repository and then use your fork as the submodule’s URL.
Pulling upstream changes into a submodule
Submodules maintain a link to the upstream repository that they originate from. You can use this link to pull upstream changes.
Imagine that after you added Python support to the syntax highlighting library, you hear that the maintainers have added TypeScript support. This sounds like a useful feature to include in your text editor and so you want to pull their changes. The first step is to
cd into the submodule and
fetch the changes:
~/src/text-editor$ cd lib/syntax-highlighter/ ~/src/text-editor/lib/syntax-highlighter$ git fetch remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (3/3), done. remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0 Unpacking objects: 100% (4/4), 419 bytes | 419.00 KiB/s, done. From github.com/username/syntax-highlighter + 49301eb...54f7bbb main -> origin/main
It’s important to
cd to the submodule directory first because, otherwise, you will fetch changes for your parent repository. The
git fetch shows that
main has been updated on the remote repository.
The changes that we want to pull in are on the
main branch, so we’ll need to
merge them into our own branch. We can use
git merge for this:
~/src/text-editor/lib/syntax-highlighter$ git merge origin/main Auto-merging src/supported-languages.txt CONFLICT (content): Merge conflict in src/supported-languages.txt Automatic merge failed; fix conflicts and then commit the result.
Oh no! There’s a merge conflict with our branch. Thankfully in this case it’s quite small:
In this case, we want to keep both changes and so we can just delete the merge conflict markers. We can now add, commit and push this change to our remote.
~/src/text-editor/lib/syntax-highlighter$ git add src/ ~/src/text-editor/lib/syntax-highlighter$ git commit [add-python 98d5210] Merge remote-tracking branch 'origin/main' into add-python ~/src/text-editor/lib/syntax-highlighter$ git push Enumerating objects: 10, done. Counting objects: 100% (10/10), done. Delta compression using up to 12 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (4/4), 500 bytes | 500.00 KiB/s, done. Total 4 (delta 0), reused 0 (delta 0) To github.com/username/syntax-highlighter.git 55086f1..98d5210 add-python -> add-python
The ability to edit the code of submodules is both their most powerful feature and their most dangerous. Maintaining your own branch tangential to the main branch of a library is incredibly useful, but be prepared to face merge conflicts.
What are submodules useful for?
Below are a couple of examples of when you might want to use submodules over a dependency manager or other solution.
Libraries not in your dependency manager
Not every library is available through a dependency manager, and no dependency manager has every library. If your dependency manager doesn’t support a certain library, then submodules can help you include it in your project.
In this case, you should weigh up the work of maintaining a submodule against the work of adding that library to your dependency management system. Remember that submodules need to be manually updated.
Editable libraries that track upstream
Dependency managers, for the most part, aren’t designed for you to modify the dependencies that they manage. If you want to make changes to a library that you depend on, then submodules might be a good solution.
Submodules keep a link to the upstream code. This means that you can still pull in the latest security and bug-fix updates from the library you depend on. If you were to just copy and paste the code into your repository, getting updates from upstream would become a lot harder.
An alternative in this situation is to try and merge your changes to the upstream repository. However, this isn’t always practical. There might be license issues, your changes might not be accepted, and even if they are you will likely have to wait a while before they get merged.
It’s not always OK to publish libraries that are developed within your organization externally due to intellectual property and copyright concerns. Internal package mirrors are one solution to this problem, as they allow you to publish packages within your organization. Submodules can be a lot simpler to manage, however, and you should weigh up the cost of keeping a submodule up-to-date against the cost of maintaining a package mirror.
When not to use submodules
Submodules are powerful, but they come with some caveats. For starters, submodules don’t have automatic update mechanisms like dependency managers do.
If you add a submodule to your project, then you become responsible for keeping it up to date, whereas if you install a dependency with a dependency manager, the dependency manager can automatically keep the package on the latest version.
Git doesn’t download the contents of a submodule by default. This is not obvious to developers who haven’t worked with submodules before and can become a trip hazard for your project. If you use submodules in your project, then it’s worth thoroughly documenting development workflows in
contributing.md or similar.
Submodules bring increased complexity to your development workflow, so it’s only worth using them if you need to. If a dependency manager will satisfy your use case, then consider using it over submodules.
Aviator: Automate your cumbersome processes
Aviator automates tedious developer workflows by managing git Pull Requests (PRs) and continuous integration test (CI) runs to help your team avoid broken builds, streamline cumbersome merge processes, manage cross-PR dependencies, and handle flaky tests while maintaining their security compliance.
There are 4 key components to Aviator:
- MergeQueue – an automated queue that manages the merging workflow for your GitHub repository to help protect important branches from broken builds. The Aviator bot uses GitHub Labels to identify Pull Requests (PRs) that are ready to be merged, validates CI checks, processes semantic conflicts, and merges the PRs automatically.
- ChangeSets – workflows to synchronize validating and merging multiple PRs within the same repository or multiple repositories. Useful when your team often sees groups of related PRs that need to be merged together, or otherwise treated as a single broader unit of change.
- TestDeck – a tool to automatically detect, take action on, and process results from flaky tests in your CI infrastructure.
- Stacked PRs CLI – a command line tool that helps developers manage cross-PR dependencies. This tool also automates syncing and merging of stacked PRs. Useful when your team wants to promote a culture of smaller, incremental PRs instead of large changes, or when your workflows involve keeping multiple, dependent PRs in sync.