software
Migrating from TFS to Git with converted metadata
Feb 21, 2013

Migrating from TFS to Git with converted metadata

Migrating from TFS to Git with converted metadata


If you are tasked with migrating a Team Foundation Server (TFS) code repository to Git and search the net, you will quickly find a number of tools:

The latter two solutions were tried in converting from the TFS repository to a Git repository.

TFS2Git has the drawback that it is a shell wrapper around the tfs and git command line commands. It just replays the history but all git commits are timestamped with the execution time of the conversion, not the initial timestamp from the TFS repository. This was not favourable.

Git-TF is created by Microsoft to provide access to a TFS repository from non-Microsoft platforms like Linux and Mac OS X. It is a two-way bridge between a local Git checkout and a TFS server, meaning that you can work on the client with your Git toolchain, but pull and push to the central TFS server. As such, it works flawlessly with your TFS server. In a normal developer setup, you only pull the latest changes from TFS, continue working in your local git clone, and push your changes back as TFS changesets. However, performing a deep clone, even from a single branch with a bit of history, takes quite some time. Do you consider 26 hours acceptable for a history of around 2500 commits?

Besides the conversion, a bit of git post-processing was done on the converted repository before it was pushed to the new git server. Let’s get started!

Converting from TFS to Git

To convert MyComponent with full history, without tagging the git revisions with the TFS commit info, run:

user@host:~$ git tf clone --deep --no-tag http://tfs.domain.local:8080/tfs/MyCode $/MyComponent/Development/Development MyComponent

In the above example, only the Development branch was converted. The team already used a minimal Git Flow like branch structure, and the most granular commit info was on this branch. Given the conversion time mentioned above, converting all branches was not on option. The intention is to execute a one-time conversion to Git and continue to work in Git. As such, we can remove the remote ref to the TFS server:

user@host:~$ cd MyComponent
user@host:MyComponent$ git branch -rd origin_tfs/tfs

Cleaning up the commit author information

After conversion, the author in every git revision refers to the respective Active Directory user account. As git users, we are used to the regular email style information. So let’s rewrite some git history. Most people seem to know the Github script for changing author info, but the script below can change multiple authors at once in a single rewrite run:

#!/bin/sh
 
git filter-branch -f --env-filter '
 
case ${GIT_COMMITTER_NAME} in
        "DOMAIN\joe") name="Joe" ; email="joe@domain.com" ;;
        "DOMAIN\foo") name="Foo" ; email="foo@domain.com" ;;
        "DOMAIN\baz") name="Baz" ; email="baz@domain.com"
esac
 
export GIT_AUTHOR_NAME="$name"
export GIT_AUTHOR_EMAIL="$email"
export GIT_COMMITTER_NAME="$name"
export GIT_COMMITTER_EMAIL="$email"
'

After running this conversion script in your git repo, it is a good idea to perform some cleanup:

user@host:MyComponent$ git gc --prune=now

Your git repository should shrink considerably.

Line endings…

The never ending debate of line endings. I’m not going to enter this debate of which is better. A good article on how Git handles line endings is the one written by Tim Clem. The only thing to mention is what I wanted and how I set it up: only Line-Feed (LF) wanted in the database, and platform dependent line endings in the working copy. This means also LF on Unices but CRLF on Windows. Add the following to a .gitattributes file in the root of your repository:

# These files are text and should be normalized (convert crlf => lf)
*.cmd        text
*.config     text
*.Config     text
*.cs         text diff=csharp
*.csproj     text
*.datasource text
*.disco      text
*.edmx       text
*.map        text
*.md         text
*.msbuild    text
*.ps1        text
*.settings   text
*.sln        text
*.svcinfo    text
*.svcmap     text
*.t4properties text
*.tt         text
*.txt        text
*.vspscc     text
*.wsdl       text
*.xaml       text
*.xsd        text
 
# Images should be treated as binary
# (binary is a macro for -text -diff)
*.ico        binary
*.jepg       binary
*.jpg        binary
*.sdf        binary
*.pdf        binary
*.png        binary

Add additional suffixes that you need mapped to either text or binary type. Now commit this file in your repository:

user@host:MyComponent$ git add .gitattributes
user@host:MyComponent$ git commit -m "Central repository configuration"

As you can see from the code samples, everything is executed on a Unix machine. As the content in the TFS repository contains Windows line endings, we need a single full conversion to make sure that everything is not correctly normalized:

user@host:MyComponent$ git rm --cached -r .
user@host:MyComponent$ git reset --hard
user@host:MyComponent$ git add .
user@host:MyComponent$ git commit -m "Introducing normalised line-endings"

The working copy was removed, we checked out a new working copy with our new settings active. This results in all text files having LF only now. We add all these changed files to the staging area and commit them. The database now correctly contains LF only for text files.

Results

Our workstation now contains a Git repository with a single master branch, with correct author information and normalized line endings working on all platforms. The only thing left to do is to push this to the new Git server and let the team work on it:

user@host:MyComponent$ git remote add origin http://me@gitserver.domain.com/MyComponent.git
user@host:MyComponent$ git push origin master

That’s all folks!