Migrating from TFS to Git with converted metadata
Migrating from TFS to Git with converted metadata
If you are tasked with migrating a Team Foundation Server (TFS) code repository to Git and search the net, you will quickly find a number of tools:
The latter two solutions were tried in converting from the TFS repository to a Git repository.
TFS2Git has the drawback that it is a shell wrapper around the tfs and git command line commands. It just replays the history but all git commits are timestamped with the execution time of the conversion, not the initial timestamp from the TFS repository. This was not favourable.
Git-TF is created by Microsoft to provide access to a TFS repository from non-Microsoft platforms like Linux and Mac OS X. It is a two-way bridge between a local Git checkout and a TFS server, meaning that you can work on the client with your Git toolchain, but pull and push to the central TFS server. As such, it works flawlessly with your TFS server. In a normal developer setup, you only pull the latest changes from TFS, continue working in your local git clone, and push your changes back as TFS changesets. However, performing a deep clone, even from a single branch with a bit of history, takes quite some time. Do you consider 26 hours acceptable for a history of around 2500 commits?
Besides the conversion, a bit of git post-processing was done on the converted repository before it was pushed to the new git server. Let’s get started!
Converting from TFS to Git
To convert MyComponent with full history, without tagging the git revisions with the TFS commit info, run:
user@host:~$ git tf clone --deep --no-tag http://tfs.domain.local:8080/tfs/MyCode $/MyComponent/Development/Development MyComponent
In the above example, only the Development branch was converted. The team already used a minimal Git Flow like branch structure, and the most granular commit info was on this branch. Given the conversion time mentioned above, converting all branches was not on option. The intention is to execute a one-time conversion to Git and continue to work in Git. As such, we can remove the remote ref to the TFS server:
user@host:~$ cd MyComponent
user@host:MyComponent$ git branch -rd origin_tfs/tfs
Cleaning up the commit author information
After conversion, the author in every git revision refers to the respective Active Directory user account. As git users, we are used to the regular email style information. So let’s rewrite some git history. Most people seem to know the Github script for changing author info, but the script below can change multiple authors at once in a single rewrite run:
#!/bin/sh
git filter-branch -f --env-filter '
case ${GIT_COMMITTER_NAME} in
"DOMAIN\joe") name="Joe" ; email="joe@domain.com" ;;
"DOMAIN\foo") name="Foo" ; email="foo@domain.com" ;;
"DOMAIN\baz") name="Baz" ; email="baz@domain.com"
esac
export GIT_AUTHOR_NAME="$name"
export GIT_AUTHOR_EMAIL="$email"
export GIT_COMMITTER_NAME="$name"
export GIT_COMMITTER_EMAIL="$email"
'
After running this conversion script in your git repo, it is a good idea to perform some cleanup:
user@host:MyComponent$ git gc --prune=now
Your git repository should shrink considerably.
Line endings…
The never ending debate of line endings. I’m not going to enter this debate of which is better.
A good article on how Git handles line endings is
the one written by Tim Clem.
The only thing to mention is what I wanted and how I set it up: only Line-Feed (LF) wanted in
the database, and platform dependent line endings in the working copy. This means also LF on
Unices but CRLF on Windows. Add the following to a .gitattributes
file in the root of your repository:
# These files are text and should be normalized (convert crlf => lf)
*.cmd text
*.config text
*.Config text
*.cs text diff=csharp
*.csproj text
*.datasource text
*.disco text
*.edmx text
*.map text
*.md text
*.msbuild text
*.ps1 text
*.settings text
*.sln text
*.svcinfo text
*.svcmap text
*.t4properties text
*.tt text
*.txt text
*.vspscc text
*.wsdl text
*.xaml text
*.xsd text
# Images should be treated as binary
# (binary is a macro for -text -diff)
*.ico binary
*.jepg binary
*.jpg binary
*.sdf binary
*.pdf binary
*.png binary
Add additional suffixes that you need mapped to either text or binary type. Now commit this file in your repository:
user@host:MyComponent$ git add .gitattributes
user@host:MyComponent$ git commit -m "Central repository configuration"
As you can see from the code samples, everything is executed on a Unix machine. As the content in the TFS repository contains Windows line endings, we need a single full conversion to make sure that everything is not correctly normalized:
user@host:MyComponent$ git rm --cached -r .
user@host:MyComponent$ git reset --hard
user@host:MyComponent$ git add .
user@host:MyComponent$ git commit -m "Introducing normalised line-endings"
The working copy was removed, we checked out a new working copy with our new settings active. This results in all text files having LF only now. We add all these changed files to the staging area and commit them. The database now correctly contains LF only for text files.
Results
Our workstation now contains a Git repository with a single master branch, with correct author information and normalized line endings working on all platforms. The only thing left to do is to push this to the new Git server and let the team work on it:
user@host:MyComponent$ git remote add origin http://me@gitserver.domain.com/MyComponent.git
user@host:MyComponent$ git push origin master
That’s all folks!