Switching an SVN repository to Git with KDE's Svn2Git

Some places where I've worked have used SVN for version control, and while the supposed simplicity and centralization of SVN can be nice in certain situations, SVN can't hold a torch to Git's speed, flexibility, and ubiquity (nowadays) for source control. Not to mention SVN doesn't have real tags or branches, just quasi-directories that can easily be mangled into a horrific mess (I see this quite often).

I've had to use some incredibly large (10,000+ revisions, 2GB+ total size) SVN repositories, and while I've managed them using git svn sometimes (see Switching an SVN repository to Git using git svn), it's much nicer to be able to migrate the entire team from SVN to Git so everyone can work on the repository much more efficiently.

For small repositories, using Git's built-in git-svn tool is not a big issue; it takes a few minutes to clone an entire SVN repository, and as long as the repository follows conventional SVN layout (branches/, tags/ and trunk/ are the only three root-level directories, and all branches/tags are flat within their respective directories...), it is simple enough to do the initial clone.

But for non-standard and large repositories, you really need some help. There's a nice ruby gem, svn2git, that wraps git svn and makes it a little easier to use, and perhaps a little faster... but it inherits all of git-svn's inherent issues, and is still dog-slow on large repositories (conversion takes hours, days, or weeks rather than minutes).

Thus, the KDE team introduced Svn2Git. Svn2Git requires Qt4 to compile, but once compiled, the application takes a local copy of the entire SVN repository (e.g. the repository directory from the SVN server, not a local checkout), and quickly converts it to a bare Git repository.

Note: Additionally, if you're cloning from a remote SVN repo, or even continuing to work with SVN generally, and you're using http://, you should strongly consider using the svn:// or svn+ssh:// protocol, which requires the svnserve daemon be running on your SVN server. This will speed up SVN operations noticably (sometimes 10x faster!), as it doesn't incur the overhead of a web server (like Apache), plus one separate web request per operation.

Using Svn2Git (Instructions for CentOS)

You need to have Qt4 and libsvn-dev installed on your machine to build Svn2Git, so install those now:

$ sudo yum install -y qt qt-devel
$ sudo yum install -y subversion-devel

Clone Svn2Git to your local machine, then build it with qmake:

$ git clone https://git.gitorious.org/svn2git/svn2git.git
$ cd svn2git
$ qmake && make

Note: If you get a qmake: command not found error, just use the full path to qmake, which is /usr/lib64/qt4/bin/qmake (on CentOS 6.4, at least).

You should now have a binary, svn-all-fast-export, in the directory. You will use that to run Svn2Git after all everything is ready. Next you need to make sure you have a few things in place for the conversion:

  1. Make sure you have a rules file (in this case, named rules-file; see the examples in the git2svn folder, and check this post for some helpful advice for tags).
  2. Make sure you have an authors file (in this case, named authors-file) to map svn commit authors to git authors. The format is like [svn-user] = John Doe <john.doe@example.com>, with one mapping per line. (To get a list of all committers in the SVN repository, use the command svn log --quiet | awk '/^r/ {print $3}' | sort -u, as demonstrated in this post on Stack Overflow).
  3. Copy the entire svn repository to a local directory, and make sure it's not named the same as the repository in your rules file. Using scp, the command would be something like scp -r user@example.com:/svn/repositories/old-svn-repository old-svn-repository. Using rsync, the command would be something like rsync -chavzP --stats user@example.com:/svn/repositories/old-svn-repository old-svn-repository

Now, you can convert the repository:

$ /path/to/svn-all-fast-export --identity-map=authors-file --rules=rules-file --stats --add-metadata old-svn-repository

Note: --add-metadata adds in SVN information to each commit, for easier lookup of old commits/refs, and --stats outputs useful statistics during the conversion.

After a long, long time (or a short time, if you have a tiny/newer repo!), the process will complete, and you'll have a simple bare git repository. Hooray!

Notes on writing rules for Svn2Git

Take a look at all the samples in the Svn2Git repository—they have a lot of good information in comments and actual rules. In my case, since the SVN repository had a few oddly-located folders in the root directory (alongside trunk/tags/branches), I just decided to remove them by not telling Svn2Git what to do with them (put this at the end of the file):

# Ignore all other directories.
match /
end match

Also, as Cody Casterline mentioned in his Lessons Learned post, you can still extract tags from your converted repository (even though the conversion turns tags into branches) by adding a rule like:

# Add a prefix to all tag branches so we can fix them later.
match /tags/([^/]+)/
  repository [repository-name]
  branch tag--\1
end match

Then, after the conversion is complete, do the following (paste the entire code block below into your terminal and hit Enter):

git branch |
# Remove spaces at beginning of line:
sed s/..// |
# Only get 'tag' branches:
grep ^tag-- |
# Strip down to just the tag name:
sed s/tag--// |
while read tagname; do
  git tag -a "$tagname" -m "Tag imported from SVN." "tag--$tagname" >/dev/null 2>/dev/null && echo "tagged: $tagname"
  git branch -D "tag--$tagname" >/dev/null 2>/dev/null && echo "deleted branch: tag--$tagname"
done

If you want to delete all the tag branches (after you've made sure the tags converted successfully), run the same command above with git branch -D "tag--$tagname" >/dev/null 2>/dev/null && echo "deleted branch: tag--$tagname"; for the command inside the loop.

Finally, if you get errors about branches not conforming to Git branch naming standards, you might need to convert spaces and other special characters to underscores. I had some branches with spaces, so I adjusted the rule for branches (thanks to PovAddictW on the #kde-git IRC channel for the tip!):

match /branches/([^/]+)/
  repository [repository-name]
  branch \1
  substitute branch s/ /_/
end match

Finishing things off

Once I had my new repository.git bare repo, I changed directory into the folder, added a remote to which I could push the repo, then used git push --all and git push --tags to push all branches and tags up to the new remote.

The next step for me is setting up a script (maybe running via cron) that will periodically update the git repo and re-sync the latest changes from SVN. It would be most excellent if I could figure out a way to make the synchronization go both ways, so developers could just use Git if they want... we'll see!

Also, I've added an Svn2Git VM example to my Ansible Vagrant Examples project on GitHub—you can easily boot up a Linux VM with everything preconfigured so you can skip ahead to actually running the svn-all-fast-export command within a few minutes (rather than configure and build Svn2Git on your own).

Further Reading

Comments

I am using a macbook and I was not able to run the binary svn2git. all i see is a new svn-all-fast-export.app file. is there anything else i need to do to make it runnable. I tried sudo for qmake && make.