From d0974a5709ff7039b2cee881efa0a645876016b9 Mon Sep 17 00:00:00 2001 From: Don Armstrong Date: Thu, 9 Aug 2012 14:46:04 -0700 Subject: [PATCH] add post on migrating to git-annex --- ...grating_from_svn_to_git_and_git_annex.mdwn | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 posts/migrating_from_svn_to_git_and_git_annex.mdwn diff --git a/posts/migrating_from_svn_to_git_and_git_annex.mdwn b/posts/migrating_from_svn_to_git_and_git_annex.mdwn new file mode 100644 index 0000000..2051c51 --- /dev/null +++ b/posts/migrating_from_svn_to_git_and_git_annex.mdwn @@ -0,0 +1,125 @@ +[[!meta title="Migrating from Subversion to git with git-annex"]] + +Recently, I've started converting many of my subversion repositories +to git, some of which contain fairly large files (2-3G). However, git +can be slow to deal with repositories with large files, and it also +isn't able to selectively discard unneeded files when disk space is +pressing. Thankfully, +[git-annex](https://www.google.com/search?q=git-annex) resolves most +of these problems with git, but the process required to use git-annex +on a converted subversion repository is slightly complicated. + +Basic conversion of svn to git +------------------------------ +The basic conversion of svn to git is done using git-svn: + + git svn clone file:///srv/svn/foo --no-metadata -A authors.txt -T trunk foo + +where /srv/svn/foo is the subversion repository, authors.txt is a list +of `login = Full Name ` pairs matching each of the +subversion commit authors, and foo is the git repository to create. + +git-svn has a ton of useful options, but the basic invocation above is +all I'm concerned with. + +Migrating large files from git into git-annex +--------------------------------------------- + +In order to migrate from git to a git+git-annex setup, we'll have to +walk the entire commit history, and edit each commit to instead store +large files in git-annex, replacing the large file with a symlink, and +finally eliminate all of the references to the old large objects, and +do garbage collection. + +Because we may have the same file move around, we're going to use the +git-annex SHA1 backend instead of the default WORM backend which is +based on filename and size, and init git-annex. + + cd foo; echo '* annex.backend=SHA1' > .git/info/attributes + git annex init + +Then, we're going to filter out the large files using `git +filter-branch`. To do that, we'll first, we'll create a little helper +script `git_annex_add.sh`, which will remove the file from the git +repository, add to git annex, and fix up the symlinks: + + #!/bin/bash + f="$1"; + git rm --cached "${f}"; + git annex add "${f}"; + annexdest="$(/bin/readlink -v ${f})"; + ln -sf "${annexdest#../../}" "${f}"; + echo -n "Added: " + ls -l "${f}"; + +Then we will run filter-branch, and annex all files larger than 5 +megabytes. +[Tweak the find command if you want to do something different.] + + git filter-branch --tag-name-filter cat --tree-filter \ + 'find . -ipath \*.git\* -prune -o -path \*.temp\* -prune -o -size +5M -type f -print0|xargs -0 -r -n1 ~/git_annex_add.sh; + git reset HEAD .git-rewrite; :' -- master + +This operation will take a while. +[It would be better to do this during the initial svn→git conversion, but since that requires more knowledge of git-svn, svn, git, and git-annex internals than I have, and I only have to do this once for each repository, it's not worth my time.] + +Now we have successfully switched everything to using git-annex, and +we need to clean out the old references to the files: + + rm .git/svn -rf; + rm -rf .git/refs/original .git/refs/remote/trunk .git/refs/remote/git-svn; + git reflog expire --expire=now --all + git gc --prune=now + git gc --prune=now --aggressive + +(I'm not sure if the last two commands need to be separate; I'm cargo +culting a bit there.) + +Storing all git-annex files in a remote repository +-------------------------------------------------- + +Because git-annex allows you to easily throw away files which are no +longer referred to by the tip of any branch using git annex unneeded +(and because I'd like all of the files on my central remote +repository), I'm going to shove all of the git annex files into the +remote bare repository. Normally, you would use `git annex copy +--to=remote;` to do this, but because that only copies needed files, +not everything, we'll have to do it manually. + +First, create the remote repository: + + git init --bare /srv/git/foo.git + cd /srv/git/foo.git; git annex init foo.example.com + +Add the remote to the local repository, push to the remote, and sync +the objects and sync the annex: + + git remote add origin ssh://foo.example.com/srv/git/foo.git + git push origin master + rsync -avP .git/annex/objects ssh://foo.example.com/srv/git/foo.git/annex/.; + git annex sync + +Finally, on the remote, run `git annex fsck` to clean up the links to +the imported objects: + + cd /srv/git/foo.git; git annex fsck; + +Unresolved issues +----------------- +I don't know if the above works properly for branches. I suspect that +it does not. I also have not exhaustively tested this methodology to +verify that all of the history is present in every case. But hopefully +this post (or some modification of it) will be helpful to you. + +Credit +------ + +Many of the methodologies described here I originally found in +[tyger's git-annex forum post](http://git-annex.branchable.com/forum/migrate_existing_git_repository_to_git-annex/), +the `git gc` stuff came from random google searches about shrinking +git repositories, and the rsync suggestion came from joeyh (author of +git-annex) and the other helpful denizens of #vcs-home on +irc.oftc.net. + + +[[!tag debian tech git git-annex]] -- 2.39.2