Maintaining projects across multiple Git repositories can become quite troublesome. While developing Shopsys Framework, we created a few new repositories for individual components of the framework and soon realized there had to be a better way for handling so many projects. This article describes our problem with multiple Git repositories and presents a possible solution that we decided to adopt.
Introduction
In the initial stages of the development of Shopsys Framework all source codes were stored in one single Git repository as the project was originally planned to be a monolithic platform. When we got around to creating the framework and started to decouple parts of the original platform into new standalone packages we decided to create new Git repositories for each of the packages.
This approach is commonly called manyrepos. We separated components for HTTP smoke testing, tools for automatic coding standards corrections, and created several plugins for product XML feeds. If you are interested, you can read our article about components and plugins.
Problems with maintaining more than one repository
Suddenly, there were several packages on our Github that needed to be managed somehow. Now it became much more difficult to develop certain features which spanned more than one repository.
For example, when we were developing a new feature (adding demo data) in a Heureka product feed, there were many steps to go through. All the changes had to be done in separate Git branches first, and then later merged and released after a code review.
The general workflow looked something like this:
- Modify a Plugin interface (stored in a separate repository), ie. add new interface responsible for loading demonstrational data (see the commit).
- In the Heureka product feed (stored in another separate repository), update the composer.json dependency on the Plugin interface to the dev-* version — note: we didn’t want to release a new version of the Plugin interface until it was validated to be useful in the implementation.
- Modify a Heureka product feed, ie. implement the interface for loading demo data.
- In the Shopsys Framework core (also in a separate repository, of course), update the dependencies on the Plugin interface and on the Heureka product feed as well, and implement the logic of loading demo data.
- Do a code review, make sure everything works and is designed properly.
- Merge the Git branch with the new Plugin interface (in its own repository) and release a new version.
- Merge the Git branch with demo data in the Heureka product feed (in its own repository) and release a new version.
- In all packages, update dependencies in composer.json from dev-* to the new releases.
That was just one plugin modification, but the same process had be applied to all the other feed plugins (Zbozi product feed, Google product feed, …). We felt that this was probably not the smoothest workflow. The code review process was difficult as well because the reviewer had to check modifications across multiple repositories. These often didn’t make sense because you were seeing them as independent from global changes that had been made in the other repositories.
Two Possible Ways to Go
We came to an agreement that maintaining a bigger amount of repositories would be very difficult at the time, and would only be inefficient. Instead we decided to do more research and came up with two ways for possible improvement — either using a monolithic repository (also known as monorepo) for all our packages, or using Git submodules for the same purpose. Let’s take a closer look at each option.
1. Monolithic repository
Monorepo is a single Git repository that can contain multiple more or less independent packages. That means even when you develop a feature spanning across multiple packages, you commit changes to the single repository.
A lot of big companies (eg. Facebook, Google, SensioLabs) use the monorepo approach for maintaining their packages. For example, Symfony framework, along with all of its components, are part of one monorepo. There you can see that all components in the src/Symfony/Component/ folder are standalone packages with their own license, readme, composer.json files, etc. In order to be able to use only the specific parts of the monorepo, the components are split into the separate read-only repositories using the special tool Splitsh.
Pros
- Much easier management of dependencies across your packages (especially when it is used along with composer local packages).
- More straightforward development and easier code review of features that span more than one package.
- Easy integration testing of all your packages.
- Successfully used by PHP giants like Symfony.
Cons
- All included packages are released together, ie. none of the packages in the monorepo can have a different version from the others.
- There is no easy way to make pull requests to a sub-repository of a monorepo. All contributions must be done to the monorepo.
Sources:
- Fabien Potencier talk (video) + slides
- Gregory Szorc — On monolithic repositories
- Tomáš Votruba — How monolithic repository in open-source saved my laziness
- Tomáš Votruba — How to Decouple Monolith like a Boss with Composer Local Packages
2. Git submodules
Submodules are a native Git technique which allows you to keep a Git repository as a subdirectory of another Git repository. The fact that the submodules are still separate repositories can be an advantage (allows independent versioning) or a disadvantage (multiple repositories management overhead) depending on the approach the company takes.
Pros
- Management of dependencies among your packages is easier than it is with the manyrepos approach.
- All submodules are separate repositories, ie. they can be released independently from each other.
Cons
- You still need to manage packages across multiple repositories.
Sources:
- Git official documentation
- Andrey Nering — Git: submodules vs. subtrees
- Martin Zlámal — Vy ještě nemáte svůj superprojekt? (in Czech)
Conclusion
Ultimately, Git submodules wouldn’t be able to solve our problems with maintaining multiple repositories. We decided to use a monolithic repository to speed up our development and code review processes.
How to merge packages’ commits to Monorepo?
An important factor in considering which approach to use was the fact we didn’t want to lose the releases and the commit history of the current packages which were already in the separate repositories. In order to achieve this, we decided to mark the current repositories as abandoned and recreate them in the new monorepo using a tool called “tomono”. The tool allows us to preserve the Git history of the original packages.
Unified Versioning
We also wanted to keep independent versioning of some specific packages but this would have caused more trouble than it was worth so we decided to unify versioning of all packages in the monorepo. The main benefit of unified versioning is that the compatibility between the packages would become much clearer.
Split with Splitsh
In order to allow other developers to use our packages independently from the monorepo, we will apply a similar approach to Symfony components management. The sub-packages in the monorepo will be split into separate read-only repositories using the tool “splitsh” which is developed and used for that purpose by SensioLabs.
Stay tuned, we will let you know the results of our transition from manyrepos to monorepo as soon as possible.