Company: Acme computers
Version control platform(s): Many GitHub Enterprise instances installed throughout the company by different teams. Acme Computers is trying to standardize on GitHub Enterprise and consolidate their GitHub usage onto a single instance. The company has many instances of other Git hosting solutions installed as well. Some are fully supported applications. Other instances are on machines under people's desks.
Customer requests:
- Shrink large repository: Acme wants GitHub to help them shrink the large repository to a more manageable size that is performant for common Git operations. The large repo is a project that is high visibility with an aggressive roadmap. They request that we help them within the month. It's a large, monolithic repository.
- Consolidate instances: Acme wants you to tell them the best way to move all the other teams, using GitHub Enterprise or other Git solutions, onto their consolidated GitHub Enterprise instance. They have asked you to give them five or six bullet points about how you would approach that initiative, both technically and culturally.
- Migrate an SVN repo: The customer has one SVN repository that hasn't migrated over to a Git solution. They would like help moving this one large repository over. The team has a trunk based development pattern with this repository and is unfamiliar with Git.
There are quite a few things that can lead to an increase repository size, so in order to make the most impactful recommendation, it would be very helpful to be able to explore the repo and take a look. In lieu of being able to explore the repo personally, the sections below contain a few ideas that might help get the repo's size under control.
If you require large files to be tracked in your repo, consider using git-lfs
:
Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
Git LFS provides a wide range of benefits, so it's definitely worth considering if the repo size is partly due to large files being tracked. If you'd like to learn more, see the instructions for installing Git LFS and the instructions for configuring Git LFS on GitHub Enterprise Server. Utilizing this solution would also likely require some modifications to your repo's git history.
Git features a command, git gc
, which will run "housekeeping" tasks against your repo that might slim down the overall size. A nice writeup about the potential value and potential pitfalls of git gc
can be found on Stack Overflow.
Make sure your repo does not have a large number of unused branches. By navigating to the Branches tab in your repo, you're able to see a high-level view of your current branches including the last activity for each and whether they have any related pull requests. You can use this view to quickly remove branches from your repo, but teams will still need to remove branches not tracked by the repo itself; this post from Stack Overflow provides a potential solution but please read the instructions carefully before using the suggested commands.
Although I'm not recommending this approach, it's worth noting that some teams have shown success by migrating a monolithic application into modules and breaking those modules out into their own repos. This process is usually time intensive and given the short timeframe needed, this is probably not the right approach to take.
There are several steps your team can take in order to reduce the future likelihood of other repositories becoming too large to manage.
Although not currently available on GitHub Enterprise Server, GitHub Packages provides development teams with the option of storing dependencies outside of your repo, which can reduce the repo size and potentially increase build time in a CI environment. If a migration to GitHub Enterprise Cloud is a potential option for the future, it might be worth considering how your teams could leverage GitHub Packages.
Within your GHE instance, you can modify the settings for the maximum file size to prevent files exceeding certain limits from being pushed. A modification to this setting might help keep future repos to a more manageable size.
Another option to help streamline your commit history while also reducing size is to utilize the Squash and merge
commit strategy within your repos. Not only will this help reduce the number of commits in your repo's history, but it will also prevent temporarily added files from being permanently included in your repo.
Depending on the source of your data, the migration to a consolidated GHE instance might have a different approach:
- Migration from other GHE Server instances - GitHub provides some excellent documentation on how to migrate content from one GitHub Enterprise Server instance to another: Migrating user, organization, and repository data. This process allows for the migration of repos and other related data such as Teams:
In a migration, everything revolves around a repository. Most data associated with a repository can be migrated. For example, a repository within an organization will migrate the repository and the organization, as well as any users, teams, issues, and pull requests associated with the repository.
- Migration from non-GHE Sources - Depending on the hosting solution for the non-GHE repos there may be a way to automate the process of migrating repos via a script that utilizes the GHE API. The API allows for the creation of Org Repos, which could be utilized by iterating over all existing repos, setting a remote for those repos, and pushing the repo to its new destination.
From a cultural perspective, I have a few recommendations:
- GitHub Workshop - In order to best prepare teams for the transition, we recommend hosting workshop led by members of our team which will dive into common GitHub functionality and provide your teams with the info they need to hit the ground running when the migration is complete. This will also be a great opportunity for us to share some of the features our customers love most, some of which they may not have previously had access to if they were running an older version of GHE Server.
- Communicate the Purpose - Communicate the primary reasons behind the need to consolidate and how it will benefit both the company and your development teams. As you transition to one supported instance, that instance is likely to see an increase in stability, your development teams will see better consistency between projects (previous instances may have been running different versions of GHE), and this will also allow the team supporting your GHE instance to focus more time on tasks that benefit the community as a whole. This is also an excellent opportunity to provide reassurance to your dev teams that their experience isn't likely to change substantially and, if anything, their access to new features and functionality is likely to increase.
- Set Expectations - Provide your development community with clear expectations for when the migration is going to take place, what impacts they can expect during that time, and who they can reach out to if they need support.
From a technical perspective, this migration should be straight forward. GitHub Importer allows for seamless migration from Subversion to GitHub.
From a cultural perspective, bringing your teams up to speed with GitHub in a safe environment will hopefully make the transition as smooth as possible. Prior to making the transition, introducing your teams to the GitHub Learning Lab would be a great way to help prepare them for what to expect. The Learning Lab includes courses which focus on how to use git as well as courses on how to make the most of GitHub itself.