Moving from Visual SourceSafe to Subversion

There are other guides for setting up Subversion and for migrating from Visual SourceSafe to Subversion, but they weren't up to date. This I suppose is normal for something that is only done once and this guide will itself get obsoleted as Visual SourceSafe and Subversion move to new versions.

So, to start with these are the version numbers of the main tools that I've been using:

Visual SourceSafe — 6.0c (build 9447);
Subversion — 1.3.2;
TortoiseSVN — 1.3.5 build 6804;
AnkhSVN — 0.5.5.1653.

Just as I was in the process of finishing this article off Subversion 1.4 came out. I've not upgraded either clients or server here yet, but my expectation is that using 1.4 for a new installation is the right choice, but it would be best to test compatibility of TortoiseSVN and AnkhSVN first.

Although it's clear that Visual SourceSafe isn't the best source control system, it is generally good enough for small programming shops like ours. Our ongoing efforts to open source FOST.3™ gave us the push to re-evaluate our source control as we want to be able to make our repository available to external developers.

I need to start with a disclaimer — I'm no expert on Subversion so there are probably better ways of doing much of this. This is just what I've done to get everything up and working for us.

Is Subversion better than Visual SourceSafe?

My motivation for wanting to switch to a different source control system was to allow the source control to be opened up properly over the Internet. Visual SourceSafe performs extremely badly over slow network connections. I think this is partly because the underlying file sharing is also very bad over high latency and low bandwidth networks. Visual SourceSafe also likes to keep open file handles to the database files and this is also pretty bad over the Internet.

A big problem with the database format is that it is not reliable in the face of unreliable network links. Data corruption is common and it's just not possible to use SourceSafe over any network that is prone to random failure.

All in all, Visual SourceSafe is OK if all of your developers are in one location, but is a complete nightmare if you use it remotely.

The thing that really sold me on Subversion though from my first few minutes of using it was the realisation that it didn't need to mark files as read-only in order to manage them. This has been causing problems with some tools that can't automate check-outs and it also causes problems with installers which copy the read-only attribute all the way to the target machine. A real pain for configuration files.

Probably the first thing that you will notice though isn't anything to do with the technical abilities of the program, but rather the aesthetics of it. Visual SourceSafe looks and feels awkward and clunky, Subversion feels slick. TortoiseSVN and AnkhSVN just seem to work so much better than SourceSafe and look so much nicer. The fact that they bring a slew of more sophisticated features too is an added bonus.

Features

The ability to manage files without using a read only attribute solves a lot of problems. If you forget to check out a file there are a few applications that will open it in a “read only” mode and not allow you to save to the same location even after you check the file out. This causes me untold woe every time I come across it, and I just can't get into the habit of always remembering to check out first.

TortoiseSVN's integration into the Explorer shell make it much easier to use. So far I can't really tell if this slows down Explorer much or not. I guess if I can't really tell then the impact isn't all that bad.

The merge feature it has is also fantastic. It highlights most things in a similar way, but when showing changes on a line it also marks the parts of the line that are changed and shows you the changes above each other. It often takes me some time to find small changes in a line code between versions — TortoiseSVN makes this easy. It does take a little practice to work out how to do this properly though.

The only web access currently available is through Apache which we don't run here. As we want to get away from using file shares to access the repository we will only want to use the svn: protocol. I'm not too concerned about encryption at the moment so won't be running it over SSH.

Although Subversion will handle the security that we need it has a very simple security model. Users are assigned passwords and groups and the users or groups are then given either no, read or read/write access to a repository directory and its children. Anonymous access can also be configured.

Where Subversion really helps though is in working practices. Having written a tool to migrate from Visual SourceSafe to Subversion I now really understand why our build and release processes where so chaotic—SourceSafe just can't be used effectively programmaticly. It is extremely time consuming to try to parse the output that it generates and none of it is aimed at making life easy for developers.

With Subversion on the other hand it is possible to write very sophisticated automated tools for handling the build processes including tagging and branching of the relevant parts of projects.

What Subversion won't do

Subversion treats file copies in a substantially different way to SourceSafe. Whilst SourceSafe has a concept of a shared file between projects there is no such concept in Subversion. Symbolic links have been much discussed, but they're not going to be here any time yet.

Depending on how you make use of the feature you may be able to use svn:external to achieve this effect.

So far we've not completely worked out how to deal with this issue for web sites we develop where directories contain files shared between all web sites and files that are unique to a given site. SourceSafe handles this very well with the shared file feature, but so far we're having to do this manually in Subversion meaning of course that some sites don't get the updates they should.

Migration planning

There are three of our servers that are relevant to how we want to run Subversion:

Angelo—A development server. Runs a test Subversion repository. This is important because you don't want to be learning Subversion on your production source control system.
Turner—The main domain controller and file server. It will run the production repository on a pair of mirrored disks.
Goes—The internet facing server connected to the development network. Originally this was going to run the Subversion service, but in the end I decided to set up port forwarding to allow external users to see the service running on Turner¹ [1Due to an oddness with Windows 2003 server this means that we've had to fiddle our DNS configurations in order to be able to use the repository URLs inhouse as well as externally.].

All of the remaining steps were done twice—firstly on Angelo to set up the test repository where I could work out how to do everything and then again on Goes and Turner to establish the final repository.

Installing Subversion

Setting up Subversion is pretty simple.

Use the Subversion installer.
Install TortoiseSVN² [2The TortoiseSVN installer will want to reboot the computer. Some people say that you don't have to, but I've seen servers die because they've not been rebooted after installing software. I've also seen other installers kill earlier installations for the same reason. Personally I tend to schedule a reboot when installing software, especially on servers.So far I've not found a reason to install the .NET hack, but YMMV.].
Unzip the service files and copy them to the Subversion install location.

File system

When setting up the repository I first tried the newer native file system (FSFS). After putting through more than 100,000 changes I discovered that there were errors in the database. Talking to one of the developers, John Szakmeister (who was very helpful), it seems that the assumed cause of this is in interaction between the Subversion server software and Apache. It turned out though that a bad sector on the development server's disk had killed 12KB of one of the data files effectively killing the database.

I also tested the Berkeley database format which has a longer history and is widely regarded as more stable than FSFS. The disadvantage is that you cannot use it with the file: protocol³ [3Just to be clear here, I'm only talking about the file: protocol over a network share. I'm not considering running the repository on a development machine as I'm not considering lone developers at all.Thanks to Stefan for pointing out I was being unclear.], but if you're like me then you probably consider this an advantage.

The Berkeley database seems to be much slower for directories that contain a lot of files than FSFS is. Upgrades may also be harder. Despite all of this I've gone with Berkely for our main repository due to the problems I'd had with FSFS, even though that isn't really fair on FSFS.

The dump and load repository commands do mean that repositories can be taken offline and switched from one format to another fairly easily. I've not tested this yet.

The repository is made by right-clicking in the directory you want to use in Explorer and choosing the option from the TortoiseSVN menu.

Setting up the service

Setting up the service is pretty easy. First off test that everything works as you expect by running the server from a command prompt. This is the command that I've been using on my test server:

svnserv --root c:\Server\Subversion\Repository

Assuming that works the service can be set up by doing this:

SVNService -install --root c:\Server\Subversion\Repository

Start the service in the normal way and make sure it is set to Automatic.

Setting security

The security we'd used for SourceSafe was very simple. Each developer had a login whose password was simply their name. This meant that others could force a checkin of any files if the first developer forgot to do so. We never needed to worry about partitioning parts of the source database as all of the developers were internal and trusted.

For the use that we need to put Subversion to though the requirements are now a bit tighter. We're going to have some anonymous access and we're also going to have external developers with varying levels of access. We're also now starting to work more with external companies to co-develop systems so we will need to allow them access to some files on what we would normally consider internal projects. The security system that Subversion has will allow us to do this, but there are a couple of gotchas.

All of the configuration that you need to change can be found in the conf sub-directory where you created the repository files. Changes take effect immediately — there's no need to stop and start the service.

svnserv.cnf

This file controls the overall repository behaviour in terms of security. Once all of the comments have been stripped out this is the effective contents of our file on the test server.

[general] password-db=passwd authz-db=authz realm=svn.test.felspar.net

The first two lines enable the default password and security configuration files. The final line changes the name of the repository (only used if you are sharing password or access control configuration files between repositories).

passwd

The password file is very simple:

[users] kirit=password

Note that the passwords really are in plain text! I understand that it is possible to use your normal Windows authentication by using the svn-ssh: protocol, but I haven't experimented with it. Configuration of Apache should also allow this, but again I've not tried it so can't comment.

authz

Getting this file right is a little tricky.

[groups]
internal=kirit,fred

[/]
*=r
kirit=rw

[/private]
*=
@internal=r
kirit=rw
vss_import=rw

[/private/test]
@internal=rw

Working through this from the top we set up a group called internal with members kirit and fred⁴ [4I'm not sure how white-space is treated in this file so I've not been risking it.]. We provide anonymous read access to the root of the repository, but kirit gets full access. I then configure a private directory to which most users have no access, the internal group has read permission and vss_import has read and write permissions. I have to specify kirit again because the server will match against the longest path it can find in this file.

Finally I set up a test area that all members of internal have read and write access to in order to play around.

The problem with it is that it doesn't work. I can't find any reason why it shouldn't in the documentation, but it looks like there is a bug in the authentication process⁵ [5This is one thing that I haven't tested against the new version. I need to find time to now plan an upgrade to 1.4 and re-test these things.]. The only way that I can see to get around this at the moment is to not allow anonymous access of any sort to the repository. This means that the configuration must be changed to:

[general] anon-access=none password-db=passwd authz-db=authz realm=svn.test.felspar.net

It also means that there is no need to use the wildcard when configuring the security settings which is probably also a bonus, but there is no anonymous access to the repository.

Backing up the server files

In order to back up the server it is not recommended to just copy the repository directory structure to another location. This sort of backup strategy doesn't work with any database application and Subversion is no exception to this.

I use this to make a copy and clean out the logs:

svnadmin hotcopy c:\Server\Subversion\Repository F:\Backup\Subversion\Current --clean-logs

Note that the command is only able to make a single directory level. So if you want to put the backup into F:\Backup\Subversion\Current you will need to create the directories F:\Backup\Subversion before running the command.

It isn't entirely clear to me whether or not it is safe to back up a Berkely database repository to a networked location. I have been doing this and the backups seem to work.

Migrating the SourceSafe database to Subversion

There are four basic approaches to switching over to Subversion:

Just use Subversion for new projects and Visual SourceSafe for old projects. This is fine if you find yourself working on lots of short lived projects and don't transfer files between them.
Just import your current working files into Subversion. This means of course that you will need to go back to SourceSafe to review older versions.
Import the histories of all files seperately. This means losing shared histories (where files have been shared between projects), but you do get most everything else.
Replay the full history of your SourceSafe databases into Subversion. This is extremely hard and may not be possible depending on how you use SourceSafe.

You should only consider the first option if each new project starts with a truly clean slate. If you maintain separate repositories for shared libraries then you may be able to get away with this and use Subversion for new things and SourceSafe for older things. It seems to me though that you won't get the benefits of making the switch and you'll get bogged down trying to maintain two systems. The approach probably works well for home projects and for test project repositories.

The second option is better in that you can at least switch everything from SourceSafe to Subversion, but you lose the history. For some sorts of work this loss probably isn't a problem. I expect that most software shops won't find this a viable option.

When Power Admin ran his migration he came across a file example.exe;5 that caused a problem. I had a similar experience and his fix of renaming the file and doing a manual add seemed to work fine for me too.

The basic issue is that the file was pinned though. Before doing the migration you should go through and check that there are no pinned files.

A second thing to watch for is that although the SourceSafe database path is set in the configuration file this information is not used by the program. You will need to manually set the environment variable SSDIR to the path of the srcsafe.ini file for your database.

The third option is much more interesting. Depending on how you use SourceSafe this may be everything that you require. There are two tools available to you that will do this sort of migration (with some caveats). Brett Wooldridge has a Perl tool available on his site which Power Admin has re-implemented in Microsoft Visual C++™.

I've not used Brett's Perl tool as I don't speak Perl, but I have succesfully imported a sub-project with Power Admin's tool. See the side bar for some extra details to watch out for.

sourcesafe2subversion

Because of the way that we use SourceSafe I really wanted a much better migration tool and so looked to enable the fourth option. This is much more complex than I'd initially thought due to the vagaries of how SourceSafe stores the history — in fact it took me several weeks to get it all working properly!

I've not come across any other tools that do this migration and of course mine has only been tested on a small test database and our main database. The main database contains nearly 13,000 files and is about five years old and has been well abused so the tool has at least been given a pretty thorough workout.

Using sourcesafe2subversion⁶ [6I'd originally used the name vss2svn for this utility without realising that this was the name of an already extant utility on Tigris. Hopefully I've not caused any confusion… Apologies to anybody if I have.] is probably going to lead to a faster migration of data if you use file sharing in SourceSafe as it will reduce the number of versions of files that it reads in. It also manages to create a better history within the Subversion repository as it better matches what actually happened to the files. It clearly isn't perfect though, but having studied the problem in great depth now I can think of a few things that could be done slightly better, but I expect the overall approach the tool takes to be the best possible.

The description of the full horror of trying to do this is left for a slightly later time when we release the tool⁷ [7I wanted to release sourcesafe2subversion at the same time that I published this article, but for a couple of reasons to do with the open sourcing process of FOST.3™ we've not bee able to do that.If you want to try the software though feel free to contact me directly and we'll see what we can work out.].

Moving forwards

We still have a few things to work out with our change of processes, especially in how to efficiently manage branching and merging. We're now improving our process to tackle things that we weren't able to do with SourceSafe and of course there are teething problems associated with this learning process.

Give us another six months with the tool and we'll be a long way beyond where we could ever have gotten to with SourceSafe.