I’m currently working on a large project in need of modernization. Amongst the various threads of technical debt, the project as inherited has its source code hosted on a self-hosted GitLab Enterprise server. The organization wishes to converge on GitHub Enterprise (cloud, not self hosted) as its source control solution, and so I was tasked with migrating nearly 100 repositories.
The Challenge
On the face of it, this looks straightforward. It’s git after all, so we can just add a new remote and push each repository to GitHub and call it a day right? Unfortunately business continuity creates many requirements, chief among which are the need to migrate without affecting developer productivity or breaking build systems, and to keep as much project history as possible including the output of merge requests in GitLab.
To cap it off, the GitLab Enterprise server is on version 10.6 which is over 7 years out of date at the time of writing and is strictly behind a corporate firewall and VPN solution.
Any Easy Solutions?
To start with, I looked for any well trodden paths that might meet some of our requirements, as well as potential tooling that I could maybe build on. I found this post by Duncan McArdle recounting his experience where he elected to script pushing the git repositories to GitHub. Unfortunately this would not meet our strict business continuity requirement and would only retain the commit history of the main branch and any branches that happened to be around at migration time.
Along this same theme, GitHub has an importer tool but this also (understandably) only grabs the git repository with the additional constraint of requiring that GitHub can reach your server.
GitLab themselves have what seems to be a robust tool called Congregate for migrating to GitLab from various other vendors that even supports migrating pull requests, but this would require extensive rework to make it function in the opposite direction.
I dug some more but ultimately didn’t uncover any other scripts or tooling and so I elected to have a go at fleshing this out.
The Way Forward
After reading documentation and experimenting with both GitLab and GitHub APIs, I reckoned the challenge would not be too steep and settled on attempting to mirror each git repository and to scaffold pull requests in GitHub using merge requests in GitLab as a reference. Building out pull requests for historic (both merged and closed) merge requests would necessitate some branching acrobatics but hopefully nothing beyond everyday branch manipulation.
Looking at SDK options, I found the Go modules google/go-github and xanzy/go-gitlab and was able to use them to perform some basic operations. The latter project is no longer maintained but this actually worked in my favor since I was working with a 7 year old GitLab API anyway.
After 3 days of hacking and 2 days testing, I had a viable tool that seemed to be largely working. There were some repositories that threw up some errors around edge cases like stacked merge requests but we were most of the way there. Rather than recounting the process, here are the main takeaways:
- The git implementation uses go-git to work programmatically with an in-memory clone for each project.
- The
master
branch is renamed tomain
during the migration, and any references/branches are updated accordingly. - Pushes to GitHub emulate
--force --mirror
to forcibly overwrite the target repository. - Merge requests are retrieved with the GitLab API, and new pull requests are constructed with the original comments/reviews intact.
- For historic merge requests, temporary source and target branches are pushed before closing the “new” pull requests and deleting the temporary branches. A note is added to the PR to indicate whether the original merge request was merged or closed (unmerged).
- Merge request authors are looked up in GitLab and if they added their GitHub username to their profile, they are attributed correctly in pull requests.
- Projects to migrate can be provided in a CSV file with two fields per records, specifying the source GitLab project and the target GitHub repo.
- The tool is multi-threaded and spawns a new thread for each project with a configurable maximum thread count.
- Both the git and pull request histories are effectively handled with idempotence so the tool can be re-run many times to incrementally transfer newer history over time.
- The main constraint when migrating pull requests for a project with lots of history are GitHub API rate limits. The tool handles these gracefully whilst also maximizing API usage within GitHub’s terms.
After tidying up the implementation and adding some configurability, I’ve published it on GitHub in case it’s useful to others embarking down a similar path.
You can see a demo repository migrated by this tool at manicminer/gh-migration-testing, including pull requests.
How To Use
If you’re interested in trying or extending the tool for your own migration effort, please note the following caveats.
The target repository/repositories on GitHub will be completely overwritten by this tool, so you should ensure not to make any embarrassing typos when supplying data to the tool. You should also not try to work with any repositories until your migration is complete, as you will clobber any work-in-progress by re-running the tool.
With that said, the tool can be run over and over until you are satisfied with the result and it will just pick up where it last left off.
If you have Golang installed, you can build and install it locally by running:
go install github.com/manicminer/gitlab-migrator
Alternatively, you can download a prebuilt binary for your platform & operating system from the GitHub releases page.
Run the tool with:
❯ gitlab-migrator -help
Usage of gitlab-migrator:
-delete-existing-repos
whether existing repositories should be deleted before migrating
-github-domain string
specifies the GitHub domain to use (default "github.com")
-github-repo string
the GitHub repository to migrate to
-github-user string
specifies the GitHub user to use, who will author any migrated PRs (required)
-gitlab-domain string
specifies the GitLab domain to use (default "gitlab.com")
-gitlab-project string
the GitLab project to migrate
-loop
continue migrating until canceled
-max-concurrency int
how many projects to migrate in parallel (default 4)
-migrate-pull-requests
whether pull requests should be migrated
-projects-csv string
specifies the path to a CSV file describing projects to migrate (incompatible with -gitlab-project and -github-repo)
-rename-master-to-main
rename master branch to main and update pull requests
-report
report on primitives to be migrated instead of beginning migration
You will need to supply personal access tokens for both GitHub and GitLab, as well the GitHub username that owns the PAT, in addition to the project/repo details.
Before embarking on your migration effort, make sure to read the fine manual, and test with a small number of repos until you’re happy – then use the same configuration options for your remaining repos.
The tool is MIT licensed so you are free to do whatever you like, including making money from it in whatever way you see fit. Contributions and bug fixes are welcome!