Tor Migrates from Gitolite/GitWeb to Gitlab

https://blog.torproject.org/gitolite-gitlab-migration/

Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server.

Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:

https://git.torproject.org/

to:

https://gitlab.torproject.org/

In your Git configuration.

The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab.

Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab.

User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining.

The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code?

Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request.

An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page.

Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost.

Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones.

A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab.

Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org.

In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why?

This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago.

The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories.

There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab.

TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories.

When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.)

So a relatively large amount of Python code was produced to basically do the following:

  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change

User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order.

The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions.

The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]

In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:

$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False}
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]

Acute eyes will notice the bell used as a notification mechanism as well in this transcript.

A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious.

This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone.

Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case
# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L]
# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite rules.

Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]
# list of endpoints taken from cgit's cmd.c
# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]
# summary
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]
# about
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]
# commit
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]
# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD]
# patch
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD]
# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD]
# log
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L]
# atom
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD]
# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD]
# tree
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L]
# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L]
# plain
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteCond %{QUERY_STRING} h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD]
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L]
# blame: disabled
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteCond %{QUERY_STRING} h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L]
# stats
RewriteCond %{REQUEST_URI} ^/(.*).git/.*$
RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L]
# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side
# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private) Puppet git repository.

{
"by": "gslin",
"descendants": 0,
"id": 40241424,
"kids": [
40241513
],
"score": 5,
"time": 1714684404,
"title": "Tor Migrates from Gitolite/GitWeb to Gitlab",
"type": "story",
"url": "https://blog.torproject.org/gitolite-gitlab-migration/"
}
{
"author": "anarcat",
"date": null,
"description": "Tor has finally completed a long migration from legacy Git infrastructure to GitLab. Update your URLs, check if anything is missing, but also reuse our code and hear our advice if you’re in the same situation.",
"image": "https://blog.torproject.org/gitolite-gitlab-migration/lead.jpg",
"logo": "https://logo.clearbit.com/torproject.org",
"publisher": "The Tor Project",
"title": "Tor migrates from Gitolite/GitWeb to GitLab | Tor Project",
"url": "https://blog.torproject.org/gitolite-gitlab-migration/"
}
{
"url": "https://blog.torproject.org/gitolite-gitlab-migration/",
"title": "Tor migrates from Gitolite/GitWeb to GitLab | Tor Project",
"description": "Tor has finally completed a long migration from legacy Git infrastructure to GitLab. Update your URLs, check if anything is missing, but also reuse our code and hear our advice if you're in the same situation.",
"links": [
"https://blog.torproject.org/gitolite-gitlab-migration/"
],
"image": "https://blog.torproject.org/gitolite-gitlab-migration/lead.jpg",
"content": "<div><p>Tor has finally completed a long migration from legacy Git\ninfrastructure (<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/git\">Gitolite and GitWeb</a>) to our self-hosted\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/gitlab\">GitLab</a> server.</p>\n<p>Git repository addresses have therefore changed. Many of you probably\nhave made the switch already, but if not, you will need to change:</p>\n<pre><code>https://git.torproject.org/\n</code></pre>\n<p>to:</p>\n<pre><code>https://gitlab.torproject.org/\n</code></pre>\n<p>In your Git configuration.</p>\n<p>The <a target=\"_blank\" href=\"https://gitweb.torproject.org/\">GitWeb front page</a> is now an archived listing of all the\nrepositories before the migration. Inactive git repositories were\narchived in GitLab <a target=\"_blank\" href=\"https://gitlab.torproject.org/legacy/gitolite/\">legacy/gitolite namespace</a> and the\n<code>gitweb.torproject.org</code> and <code>git.torproject.org</code> web sites now\nredirect to GitLab.</p>\n<p>Best effort was made to reproduce the original gitolite repositories\nfaithfully and also avoid duplicating too much data in the\nmigration. But it's <em>possible</em> that some data present in Gitolite has\nnot migrated to GitLab.</p>\n<p>User repositories are particularly at risk, because they were\nmassively migrated, and they were \"re-forked\" from their upstreams, to\navoid wasting disk space. If a user had a project with a matching name\nit was <em>assumed</em> to have the right data, which might be inaccurate.</p>\n<p>The two virtual machines responsible for the legacy service (<code>cupani</code>\nfor <code>git-rw.torproject.org</code> and <code>vineale</code> for <code>git.torproject.org</code> and\n<code>gitweb.torproject.org</code>) have been shutdown. Their disks will remain\nfor 3 months (until the end of July 2024) and their backups for\nanother year after that (until the end of July 2025), after which\npoint all the data from those hosts will be destroyed, with only the\nGitLab archives remaining.</p>\n<p>The rest of this article expands on how this was done and what kind of\nproblems we faced during the migration.</p>\n<h2>Where is the code?</h2>\n<p>Normally, nothing should be lost. All repositories in gitolite have\nbeen either explicitly migrated by their owners, forcibly migrated by\nthe sysadmin team (<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/\">TPA</a>), or explicitly destroyed at their owner's\nrequest.</p>\n<p>An exhaustive <a target=\"_blank\" href=\"https://archive.torproject.org/websites/gitolite2gitlab.txt\">rewrite map</a> translates gitolite projects to GitLab\nprojects. Some of those projects actually redirect to their <em>parent</em>\nin cases of empty repositories that were obvious forks. Destroyed\nrepositories redirect to the GitLab front page.</p>\n<p>Because the migration happened progressively, it's technically\npossible that commits pushed to gitolite were lost after the\nmigration. We took great care to avoid that scenario. First, we\nadopted a proposal (<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-36-gitolite-gitweb-retirement\">TPA-RFC-36</a>) in June 2023 to announce the\ntransition. Then, in <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41213\">March 2024</a>, we locked down all repositories\nfrom any further changes. Around that time, only a <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41214#note_2983302\" title=\"handful of repositories\">handful of\nrepositories</a> had changes made after the adoption date, and we\nexamined each repository carefully to make sure nothing was lost.</p>\n<p>Still, we built a <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41215#note_3023924\">diff of all the changes in the git references</a>\nthat archivists can peruse to check for data loss. It's large (6MiB+)\nbecause a lot of repositories were migrated before the mass migration\nand then kept evolving in GitLab. Many other repositories were rebuilt\nin GitLab from parent to rebuild a fork relationship which added extra\nreferences to those clones.</p>\n<p>A note to amateur archivists out there, it's probably too late for one\nlast crawl now. The Git repositories now all redirect to GitLab and\nare effectively unavailable in their original form.</p>\n<p>That said, the GitWeb site was crawled into the <a target=\"_blank\" href=\"https://archive.org/\">Internet Archive</a> <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41218#note_2992296\">in\nFebruary 2024</a>, so at least some copy of it is available in the\n<a target=\"_blank\" href=\"https://web.archive.org/web/20240204162238/https://gitweb.torproject.org/\">Wayback Machine</a>. At that point, however, many developers had already\nmigrated their projects to GitLab, so the copies there were already\npossibly out of date compared with the repositories in GitLab.</p>\n<p><a target=\"_blank\" href=\"https://www.softwareheritage.org/\">Software Heritage</a> also has a copy of all repositories hosted on\nGitolite <a target=\"_blank\" href=\"https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4939\">since June 2023</a> and have continuously kept mirroring the\nrepositories, where they will be kept hopefully in eternity. There's\nan <a target=\"_blank\" href=\"https://gitlab.softwareheritage.org/swh/devel/swh-web/-/issues/4787\">issue</a> where the main website can't find the repositories when\nyou search for <code>gitweb.torproject.org</code>, instead <a target=\"_blank\" href=\"https://archive.softwareheritage.org/browse/search/?q=git.torproject.org&amp;visit_type=git&amp;with_content=true&amp;with_visit=true\">search for\n<code>git.torproject.org</code></a>.</p>\n<p>In any case, if you believe data is missing, please do let us know by\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/new\">opening an issue with TPA</a>.</p>\n<h2>Why?</h2>\n<p>This is an old project in the making. The first <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/40472\">discussion about\nmigrating from gitolite to GitLab</a> started in 2020 (almost 4 years\nago). But <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/trac#history\">going further back</a>, the first GitLab experiment was in\n2016, almost a decade ago.</p>\n<p>The current GitLab server dates from 2019, <a target=\"_blank\" href=\"https://blog.torproject.org/from-trac-into-gitlab-for-tor/\">replacing Trac for issue\ntracking in 2020</a>. It was originally supposed to host only mirrors\nfor merge requests and issue trackers but, naturally, one thing led to\nanother and eventually, GitLab had grown a container registry,\ncontinuous integration (CI) runners, GitLab Pages, and, of course,\nhosted most Git repositories.</p>\n<p>There were hesitations at moving to GitLab for code hosting. We had\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/81\">discussions about the increased attack surface</a> and <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/98\">ways to\nmitigate that</a>, but, ultimately, it seems the issues were not that\nserious and the community embraced GitLab.</p>\n<p>TPA actually migrated its most critical repositories out of shared\nhosting entirely, into specific servers (e.g. the Puppet Git\nrepository is just on the Puppet server now), leveraging Git's\ndecentralized nature and removing an entire attack surface from our\ninfrastructure. Some of those repositories are <em>mirrored</em> back into\nGitLab, but the authoritative copy is not on GitLab.</p>\n<p>In any case, the proposal to migrate from Gitolite to GitLab was\neffectively just formalizing a <em>fait accompli</em>.</p>\n<h2>How to migrate from Gitolite / cgit to GitLab</h2>\n<p>The progressive migration was a challenge. If you intend to migrate\nbetween hosting platforms, we strongly recommend to make a \"flag day\"\nduring which you migrate <em>all</em> repositories <em>at once</em>. This ensures a\nsmoother transition and avoids elaborate rewrite rules.</p>\n<p>When Gitolite access was shutdown, we had repositories on both GitLab\nand Gitolite, without a clear relationship between the two. A priori,\nthe plan then was to import all the remaining Gitolite repositories\ninto the <code>legacy/gitolite</code> namespace, but that seemed wasteful,\nparticularly for large repositories like <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/applications/tor-browser\">Tor Browser</a> which uses\nnearly a gigabyte of disk space. So we took special care to avoid\nduplicating repositories.</p>\n<p>When the <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41215\">mass migration</a> started, only 71 of the 538 Gitolite\nrepositories were <code>Migrated to GitLab</code> in the <code>gitolite.conf</code>\nfile. So, given that we had <em>hundreds</em> of repositories to migrate:, we\ndeveloped some automation to \"<a target=\"_blank\" href=\"https://xkcd.com/1205/\">save time</a>\". We already automate\nsimilar ad-hoc tasks with <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/fabric/\">Fabric</a>, so we used that framework here\nas well. (Our normal configuration management tool is <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/puppet\">Puppet</a>,\nwhich is a poor fit here.)</p>\n<p>So a relatively <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/fabric-tasks/-/blob/85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76/fabric_tpa/gitolite.py\">large amount of Python code</a> was produced to\nbasically do the following:</p>\n<ol>\n<li>check if all on-disk repositories are listed in <code>gitolite.conf</code>\n(and vice versa) and either add missing repositories or delete\nthem from disk if garbage</li>\n<li>for each repository in <code>gitolite.conf</code>, if its category is marked\n<code>Migrated to GitLab</code>, skip, otherwise;</li>\n<li>find a matching GitLab project by name, prompt the user for\nmultiple matches</li>\n<li>if a match is found, redirect if the repository is non-empty <ul>\n<li>we have GitLab projects that <em>look</em> like the real thing, but are\nonly present to host migrated Trac issues</li>\n<li>in such cases we cloned the Gitolite project locally and pushed\nto the existing repository instead</li>\n</ul>\n</li>\n<li>otherwise, a new repository is created in the <code>legacy/gitolite</code>\nnamespace, using the \"import\" mechanism in GitLab to automatically\nimport the repository from Gitolite, creating redirections and\nupdating <code>gitolite.conf</code> to document the change</li>\n</ol>\n<p>User repositories (those under the <code>user/</code> directory in Gitolite) were\nhandled specially. First, the existing redirection map was checked to\nsee if a similarly named project was migrated (so that,\ne.g. <code>user/dgoulet/tor</code> is properly treated as a fork of\n<code>tpo/core/tor</code>). Then the parent project was forked in GitLab and the\nGitolite project force-pushed to the fork. This allows us to show the\nfork relationship in GitLab and, more importantly, benefit from the\n\"pool\" feature in GitLab which deduplicates disk usage between forks.</p>\n<p>Sometimes, we found no such relationships. Then we simply imported\nmultiple repositories with similar names in the <code>legacy/gitolite</code>\nnamespace, sometimes creating forks between user repositories, on a\nfirst-come-first-served basis from the <code>gitolite.conf</code> order.</p>\n<p>The code used in this migration is now available publicly. We\nencourage other groups planning to migrate from Gitolite/GitWeb to\nGitLab to use (and contribute to) our <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/fabric-tasks/\">fabric-tasks</a> repository,\neven though it does have its fair share of hard-coded assertions.</p>\n<p>The main entry point is the <code>gitolite.mass-repos-migration</code> task. A\ntypical migration job looked like:</p>\n<pre><code>anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration \n[...]\nINFO: skipping project project/help/infra in category Migrated to GitLab\nINFO: skipping project project/help/wiki in category Migrated to GitLab\nINFO: skipping project project/jenkins/jobs in category Migrated to GitLab\nINFO: skipping project project/jenkins/tools in category Migrated to GitLab\nINFO: searching for projects matching fastlane\nINFO: Successfully connected to https://gitlab.torproject.org\nimport gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] \nINFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'\nINFO: building a new connect to cupani\nINFO: defaulting name to fastlane\nINFO: importing project into GitLab\nINFO: Successfully connected to https://gitlab.torproject.org\nINFO: loading group legacy/gitolite/project/tor-browser\nINFO: archiving project\nINFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane\nINFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane\nINFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive\nINFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable\nINFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt\nINFO: modifying gitolite.conf to add: \"config gitweb.category = Migrated to GitLab\"\nINFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab\nINFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab\n[...]\n</code></pre>\n<p>In the above, you can see migrated repositories skipped then the\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane\">fastlane project</a> being archived into GitLab. Another example with\na later version of the script, processing only user repositories and\nshowing the interactive prompt and a force-push into a fork:</p>\n<pre><code>$ fab -H cupani.torproject.org gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'\nINFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab\n[...]\nINFO: skipping project user/phw/atlas in category Migrated to GitLab\nINFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)\nINFO: Successfully connected to https://gitlab.torproject.org\nINFO: user repository detected, trying to find fork phw/obfsproxy\nWARNING: no existing fork found, entering user fork subroutine\nINFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)\n0 legacy/gitolite/debian/obfsproxy\n1 legacy/gitolite/debian/obfsproxy-legacy\n2 legacy/gitolite/user/asn/obfsproxy\n3 legacy/gitolite/user/ioerror/obfsproxy\n4 tpo/anti-censorship/pluggable-transports/obfsproxy\n5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy\nselect parent to fork from, or enter to abort: ^G4\nINFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414\nfork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] \nINFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy\nINFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw\nINFO: waiting for fork to complete...\nINFO: fork status: started, sleeping...\nINFO: fork finished\nINFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy\nINFO: deleting branch protection: &lt;class 'gitlab.v4.objects.branches.ProjectProtectedBranch'&gt; =&gt; {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False}\nINFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy\nCloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...\nINFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy\nremote: \nremote: To create a merge request for bug_10887, visit: \nremote: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887 \nremote: \n[...]\nTo ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy\n + 2bf9d09...a8e54d5 master -&gt; master (forced update)\n * [new branch] bug_10887 -&gt; bug_10887\n[...]\nINFO: migrating repo\nINFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy\nINFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt\nINFO: modifying gitolite.conf to add: \"config gitweb.category = Migrated to GitLab\"\nINFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab\nINFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)\nINFO: user repository detected, trying to find fork phw/scramblesuit\nWARNING: no existing fork found, entering user fork subroutine\nWARNING: no matching gitlab project found for user/phw/scramblesuit\nINFO: user fork subroutine failed, resuming normal procedure\nINFO: searching for projects matching scramblesuit\nimport gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] \nINFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists\nINFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'\nINFO: importing project into GitLab\nINFO: Successfully connected to https://gitlab.torproject.org\nINFO: loading group legacy/gitolite/user/phw\nINFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit\nINFO: archiving project\nINFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit\nINFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt\nINFO: modifying gitolite.conf to add: \"config gitweb.category = Migrated to GitLab\"\nINFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab\n[...]\n</code></pre>\n<p>Acute eyes will notice the <a target=\"_blank\" href=\"https://anarc.at/blog/2022-11-08-modern-bell-urgency/\">bell used as a notification mechanism</a>\nas well in this transcript.</p>\n<p>A lot of the code is now useless for us, but some, like \"commit and\npush\" or <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/fabric-tasks/-/blob/85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76/fabric_tpa/git.py#L120-153\"><code>is-repo-empty</code></a> live on in the <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/fabric-tasks/-/blob/85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76/fabric_tpa/git.py\">git module</a> and, of\ncourse, the <a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/fabric-tasks/-/blob/85121b4a8a293cebb0d9dfd68ebf26e2cc95ed76/fabric_tpa/gitlab.py\">gitlab module</a> has grown some legs along the\nway. We've also found fun bugs, like a <a target=\"_blank\" href=\"https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642504\">file descriptor exhaustion in\nbash</a>, among other oddities. The <a target=\"_blank\" href=\"https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/11#tab-issues\">retirement milestone</a> and\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/tpo/tpa/team/-/issues/41215\">issue 41215</a> has a detailed log of the migration, for those\ncurious.</p>\n<p>This was a challenging project, but it feels nice to have this behind\nus. This gets rid of 2 of the 4 remaining machines running Debian\n\"old-old-stable\", which moves a bit further ahead in our late\n<a target=\"_blank\" href=\"https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/5#tab-issues\">bullseye upgrades milestone</a>.</p>\n<p>Full transparency: we tested GPT-3.5, GPT-4, and other large language\nmodels to see if they could answer the question \"write a set of\nrewrite rules to redirect GitWeb to GitLab\". This has become a\nstandard LLM test for your faithful writer to figure out how good a\nLLM is at technical responses. None of them gave an accurate,\ncomplete, and functional response, for the record.</p>\n<p>The actual rewrite rules as of this writing follow, for humans that\nactually like working answers provided by expert humans instead of\nartificial intelligence which currently seem to be, glorified,\nmansplaining interns.</p>\n<h2>git.torproject.org rewrite rules</h2>\n<p>Those rules are relatively simple in that they rewrite a single URL to\nits equivalent GitLab counterpart in a 1:1 fashion. It relies on the\n<a target=\"_blank\" href=\"https://archive.torproject.org/websites/gitolite2gitlab.txt\">rewrite map</a> mentioned above, of course.</p>\n<pre><code>RewriteEngine on\n# this RewriteMap connects the gitweb projects to their GitLab\n# equivalent\nRewriteMap gitolite2gitlab \"txt:/etc/apache2/gitolite2gitlab.txt\"\n# if this becomes a performance bottleneck, convert to a DBM map with:\n#\n# $ httxt2dbm -i mapfile.txt -o mapfile.map\n#\n# and:\n#\n# RewriteMap mapname \"dbm:/etc/apache/mapfile.map\"\n#\n# according to reports lavamind found online, we hit such a\n# performance bottleneck only around millions of entries, which is not our case\n# those two rules can go away once all the projects are\n# migrated to GitLab\n#\n# this matches the request URI so we can check the RewriteMap\n# for a match next\n#\n# WARNING: this won't match URLs without .git in them, which\n# *do* work now. one possibility would be to match the request\n# URI (without query string!) with:\n#\n# /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?.\n#\n# I haven't been able to figure out the actual structure of\n# those URLs, so it's really hard to figure out the boundaries\n# of the project name here. I stopped after pouring around the\n# http-backend.c code in git\n# itself. https://www.git-scm.com/docs/http-protocol is also\n# kind of incomplete and unsatisfying.\nRewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$\n# this makes the RewriteRule match only if there's a match in\n# the rewrite map\nRewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L]\n# Fallback everything else to GitLab\nRewriteRule (.*) https://gitlab.torproject.org [R=302,L]\n</code></pre>\n<h2>gitweb.torproject.org rewrite rules</h2>\n<p>Those are the vastly more complicated GitWeb to GitLab rewrite\nrules.</p>\n<p>Note that we say \"GitWeb\" but we were actually <em>not</em> running\n<a target=\"_blank\" href=\"https://git-scm.com/docs/gitweb\">GitWeb</a> but <a target=\"_blank\" href=\"https://git.zx2c4.com/cgit/\">cgit</a>, as the former didn't actually scale for us.</p>\n<pre><code>RewriteEngine on\n# this RewriteMap connects the gitweb projects to their GitLab\n# equivalent\nRewriteMap gitolite2gitlab \"txt:/etc/apache2/gitolite2gitlab.txt\"\n# special rule to process targets of the old spec.tpo site and\n# bring them to the right redirect on the new spec.tpo site. that should turn, for example:\n#\n# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt\n#\n# into:\n#\n# https://spec.torproject.org/address-spec\nRewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]\n# list of endpoints taken from cgit's cmd.c\n# those two RewriteCond are necessary because we don't move\n# all repositories at once. once the migration is completed,\n# they can be removed.\n#\n# and yes, they are copied all over the place below\n#\n# create a match for the project name to check if the project\n# has been moved to GitLab\nRewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$\n# this makes the RewriteRule match only if there's a match in\n# the rewrite map\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\n# main project page, like summary below\nRewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]\n# summary\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]\n# about\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L]\n# commit\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond \"%{QUERY_STRING}\" \"(.*(?:^|&amp;))id=([^&amp;]*)(&amp;.*)?$\"\nRewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]\n# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} id=([^&amp;]*)\nRewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD]\n# patch\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} id=([^&amp;]*)\nRewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD]\n# rawdiff, incomplete because can show only one file diff, which GitLab cannot\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} id=([^&amp;]*)\nRewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD]\n# log\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} h=([^&amp;]*)\nRewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L]\n# atom\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} h=([^&amp;]*)\nRewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD]\n# refs, incomplete because two pages in GitLab, defaulting to \"tags\"\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} h=([^&amp;]*)\nRewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD]\n# tree\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} id=([^&amp;]*)\nRewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L]\n# /-/tree has no good default in GitLab, revert to HEAD which is a good\n# approximation (we can't assume \"master\" here anymore)\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L]\n# plain\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteCond %{QUERY_STRING} h=([^&amp;]*)\nRewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD]\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L]\n# blame: disabled\n#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$\n#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\n#RewriteCond %{QUERY_STRING} h=([^&amp;]*)\n#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD]\n# same default as tree above\n#RewriteCond %{REQUEST_URI} ^/(.*).git/.*$\n#RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\n#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L]\n# stats\nRewriteCond %{REQUEST_URI} ^/(.*).git/.*$\nRewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND\nRewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L]\n# still TODO:\n# repolist: once migration is complete\n#\n# cannot be done:\n# atom: needs a feed token, user must be logged in\n# blob: no direct equivalent\n# info: not working on main cgit website?\n# ls_cache: not working, irrelevant?\n# objects: undocumented?\n# snapshot: pattern too hard to match on cgit's side\n# special case, we keep a copy of the main index on the archive\nRewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]\n# Fallback: everything else to GitLab\nRewriteRule .* https://gitlab.torproject.org [R=302,L]\n</code></pre>\n<p>The reference copy of those is available in our (currently private)\nPuppet git repository.</p>\n </div>",
"author": "",
"favicon": "https://blog.torproject.org/static/images/favicon/favicon.png",
"source": "blog.torproject.org",
"published": "",
"ttr": 619,
"type": ""
}