Centralized VS Distributed SCM

Distributed Source Control Management systems have become a trend in the last years: bazaar, git, mercurial, svk. Distributed SCM is a fascinating concept, but how well does it perform in practice? Some say that it leads to a phenomenon called branch proliferation.
I have researched optimistic replication for one year at the beginning of my PhD. A distributed SCM, or correctly called multiprimary approach, has the benefits of reduced latency and availability under network partitions (when you can’t reach the server). The downside is the difficulty of reconciliation: as you permit your replicas to diverge further the probability of conflicts grows (practically exponentially). These conclusions don’t come only from theory, reconciliation based systems existed since the 80s and never had too much success in industry because their implementation was too complex and divergence soon made the systems unusable.
I can confirm all of this by my experience as a consultant at <censored>one of the major banks</censored>. Having even 3 active branches is enough to seriously decrease your productivity. We’re not really using a distributed SCM. We’re just using Subversion, but we’ve adopted a branching policy (that someone called “early branching”, as if it was a common pattern) where a branch is created for a version as soon as there is something planned for that version.
The consequence of this is that every change made to the current version has to be merged with all the future versions. When you are lucky there are no conflicts, when you are a little less lucky there are conflicts detected by SVN, but if you’re not lucky you have “semantic” conficts that are detected only when running the overnight tests, and it is not easy to figure out that tests are failing due to inconsistencies caused by merges.
The patch to the problem was to put one of the most senior developers to do the merging. Quickly enough this had become his only job and actually keeping up with a number of conflicts generated by 100 developers couldn’t be handled. Just a couple of days ago I found out that the guy got sick of merging and is leaving. Now the developers are in charge of merging their changes into the other branches as soon as possible. And finally managers (who are apparently responsible for this error) are beginning to realize that branching doesn’t come for free.

With this I don’t want to say distributed SCMs are not the way to go. On the countrary. They just need to put in place mechanisms to prevent branch proliferation, but still take advantage of a decentralized architecture and fault tollerance.

About these ads

3 Responses to “Centralized VS Distributed SCM”

  1. Jakub Narebski Says:

    Actually you would want to either use some kind of Continuous Integration procedure, or have dedicated maintainer which is responsible for merging the code contributions (which is by the way position recommended to have in classic “The Mythical Man-Month” by Fred Brooks).

    Besides, you probably want to reduce number of long-lived branches (to maintenance, stable (master) and development (next), but develop new features in isolation in separate _short lived_ topic branches.

  2. jaksa Says:

    That’s exactly what we have on another big project, which is older and has a much more mature process.
    With the only exception being that merging is performed by every developer: I don’t think a full-time mergers job is one of the most gratifying.

  3. Alainavc Says:

    Interesting post.., bro


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: