Migration from Subversion to Git ( Why and How )

A look at Centralized vs Distributed Version Control

GenerationNetworkingOperationsConcurrencyExamples
FirstNoneOne file at a timeLocksRCS, SCCS
SecondCentralizedMulti-fileMerge before commitCVS, SourceSafe, Subversion, Team Foundation Server
ThirdDistributedChangesetsCommit before mergeBazaar, Git, Mercurial

[1]

Early VCSes (Version Control systems) were designed around a centralized model in which each project has only one repository to be used by all developers. All “document management systems” are designed in this way. This model, however, poses two problems. The first is that a single repository also becomes a single point of failure — if the repository server ever goes down all work stops. The second is that you need to be connected live to the server to do checkins and checkouts; so if you’re offline, you can’t work. Newer, third-generation VCSes are decentralized. This means that a project may have several different repositories; these systems support a sort of super-merge between repositories that tries to reconcile their change histories. At the limit, each developer has his/her own repository, and repository merges replace checkin/commit operations as a way of passing code between developers. An important practical benefit is that such systems allow for operation; meaning, you don’t need to be on the Internet to commit to the repository because you carry your own repository around with you. Pushing changesets to someone else’s repository now becomes is a slower but also less frequent operation. [2]

Branching is one of the core features of a version control system that you should focus on when selecting a choice of tools. A branch is the feature of a VCS that lets the developer fork the development of the project in a different direction. There is no rigid rule imposed on a branching strategy but the quintessential branch is the release branch. This branch remains separate from the code base for different versions of the project. Another common kind of branch can be the dreaded/loved hotfix branch for getting an emergency patch to production when things start falling apart. The major challenge with any kind of branching strategy is the step where it needs to be integrated back to the main branch or Mainline. Additionally, in CVCS like Subversion most developers frequently work on the same release branch and are aware of any that changes reflect immediately in the repository. Hence, they will not check in code frequently, thus canceling out the benefits of version control. To counteract, the natural course of action might be to check in frequently but that in turn might destabilize the development branch and frequently break the build. Topic branching is apt for this situation since it lets each developer work in isolation on a feature or bug fix while committing frequently while also being able to freely experiment without having to worry. CVCS and DVCS can both be used for topic branching but the distributed nature of Git makes it possible for developers to experiment on topic branches locally and handpick branches to be pushed remotely. Also, branching in Git is “cheap”: easy to create, integrate and takes up very little resources unlike Subversion.

Subversion is one of the most popular version control which falls under centralized version control whereas Git is undoubtedly the most popular decentralized version control tool in use currently. The graph below generated using Google search keyword popularity shows the upward trajectory for Git and the reciprocal downward trend for Subversion. This is an excellent indicator of the state of CVCS vs DVCS as well.

These days, many developers are choosing Git as their source code repository of choice for new projects. Adoption of Git for collaboration on open-source as well as closed source projects has been further accelerated by web based hosting services for Git such as Bitbucket and Github. These website allow users to work together without overstepping on each other via pull requests.

Steps to Migrate your SVN repo to a Git repo

A major question faced by users of Subversion and other CVCS is the specifics of how to migrate their current repositories to Git without loss of metadata such as the contents,author, timestamp of each commit and also structural information such as branches, tags etc. One of the most recent interesting example has been the case of how Atlassian  migrated from Subversion to Git which they were kind to share in that article.

At Addteq too, we recently migrated a Subversion repository to Git successfully using the steps below.

From the “working copy” of the subversion repository, we issue the following command, to fetch a list of “committers” :

				
					svn log --quiet | grep "^r" | awk '{print $3}' | sort | uniq > users.txt
				
			

Once the above mentioned command finishes running, open the file “users.txt” in a text editor. For each username in the file, for example, “angrydev”, map it to the first name, last name(optional) and email address in this syntax:

				
					angrydev = Angry Dev 
				
			

Why do we need to remap usernames from a Subversion repository to this format? The answer lies in the diagram below which shows the structure of a Git commit object. Git stores the committer name and email address whereas in Subversion only committer username is stored.

The final step involved running a Perl script that we wrote to do the migration process: 

				
					./svn2git.pl --user users.txt --url http: //svn.company.com/svn/project --dest git-repo
				
			

The Perl script automates the following steps:

  • Uses the git-svn command to get the commits from the Subversion repository.
  • Converts the remote branches into local branches.
  • Since tags are imported as branches. It creates a local branch, makes a tag and delete the branch. This was tags from Subversion are successfully converted into Git tags.

And there you have it! In the end after the script finishes running and waiting for some time depending on the size of the repository, we have a Git repository that contains the same commits and structure like branches and tags.

At Addteq we use Atlassian Stash to manage our shared Git repositories for different projects between team members. The final step after the migration process would be setting up a new Git repository in Stash. Then, adding the remote and pushing the local branches and tags:

				
					git remote add origin https://stash.company.com/scm/project/repository.git
git push origin –all
git push origin –all
				
			

Further questions?

Addteq specializes in migration from Subversion or other version control tool to Git. As official Atlassian experts and can assist with getting started with Stash for managing Git repositories. Get started on your migration project today by contacting us for a Free, no obligation consultation!

References

[1] http://www.ericsink.com/vcbe/html/history_of_version_control.html

[2] http://www.catb.org/~esr/writings/version-control/version-control.html

Related Content
work from anywhere
Embracing the Freedom: Work from anywhere
If our products can be used from anywhere, we should also be able to work from anywhere. This blog shows...
Be_Unstoppable
Jira Accessibility: Best Practices for enhancing collaboration
Jira is a powerful tool to streamline workflows and enhance productivity. This blog explores four best...
addteq_fb_collab4b
The Perfect Match: Confluence & Excellentable
Discover the perfect match for your team's collaboration needs this Valentine's Day. Learn how to seamlessly...

Leave a Reply

Your email address will not be published. Required fields are marked *