Monday, August 22, 2005

Distributed Revision Control No Threat, Pt. II

In yesterday's diary I lined out an explanation of why choosing a non-distributed revision control system was a power play. Later I realized that some people would probably say to themselves "Yeah, but I'm not trying to do a power play. I'm just sticking with a tool that I know". Regardless of intent the end result is the same. Any developer that chooses a non-distributed revision control system has also made development more difficult for non-sanctioned developers, which I'll dive into more deeply some other day.

The big scary spider in the minds of many developers is that some day their project might be forked by somebody else. This fear, which is partially justified, presents with a variety of unhealthy behaviors: obfuscated code, poorly documented code, development plans happening in private backroom discussions rather than in public and centralizaton of control are only some of the tactics that some use in order to hold onto a project.

There are at least three scenarios that result in a potential forking situation: temporarily abandoned projects, divergance of interests and poorly managed projects. Each of these driving forces results in a different type of fork that can have different long term results for the project.

A temporarily abandoned project is one in which the developer has gone AWOL. Perhaps he's burned out, holding his coding hostage for pay or overcommitted in his work. For whatever the reason, the lead developer of the otherwise healthy project is no longer actually leading development. Nature, who abhors a vacuum, sees to it that eventually a new lead developer springs out of the ether. The old project lead, realizing that he's no longer in charge, rushes back to the scene and attempts to regain control. In most cases, the new developer steps back and allows the old lead to regain control of the project. In less common cases, especially if the old lead had limited management skills or had shown a previous history of abandonment, the new project lead will continue development against the wishes of the old developer, leading to two projects in the place where one previously existed.

In a non-distributed revision control system, the new lead has to setup a new revision control system, get the code freshly imported, setup accounts.... only then can he actually start pulling in the patches laying around from the various interests. This code is at risk of being lost when the old lead returns. To avoid this loss, the new lead will likely continue development parallel to the old lead, leading to an arms race.

Imagine if the old lead had been using a distributed RCS. The new lead would have branched from the old lead and continued development. The returning old lead would simply be able to merge the new lead's code and development would continue as if nothing had ever happened. No work is lost, no pride is at risk, and everyone eventually ends up happy. In the abandonment case a distributed revision control system actually reduces the chance of a long term fork.

In a succesful project the userbase typically grows substantially. Over time the project can realize that it is supporting two or more mutually exclusive goals. A project typically handles this by splitting off a subproject that focuses on the unique subgoals. The long term relationship between these two projects is highly dependant upon the relationship between the two lead developers; in many cases the break-up is mutually agreeable and the two parties work together on the interests that are common.

In a non-distributed RCS setup the divergance of the two projects is encouraged because merging becomes increasingly difficult. Eventually the two projects get tried of passing deltas back and forth and give up working together. In a revision control system the merging of specific patches from two distinct, but related, tree is very simple. This feature encourages more merging between projects and encourages both projects to merge the fixes for the similiar projects between one another. Thusly: Distributed revision control systems minimize divergance between two related projects.

Finally, we come to the case of the mismanaged project. I've often heard the following quote: "The internet routes around damage". In a mismanaged project several members of a project have gotten so fed up a project that they have decided to take on a good deal of additional work just to ensure the project's success. Nobody takes this task on lightly due to the high volume of non-development related work that goes with project leadership. Consider the following effort that typically needs to be expended to start a new/forked project:

  • A website to gather and disseminate information. The website needs to be well maintained in order to convince users to switch over to the new project.
  • An IRC channel needs to be manned for both developer discussions and user support. Again, this needs to be done well to encourage mindshare
  • A bug tracker for the new project is a must for any project of significant size. Be ready to spend anywhere from five to twenty hours a week triaging bugs.
  • A mailing list needs to be setup and the public informed that it exists.
  • Third party announcements (release announcements, freshmeat announcements, etc)
  • Conferences, speaking engagements, trolling user groups, etc may become a requirement.

    By the time a group of users is willing to hostily fork a project, the choice of RCS is moot. The user has already undertaken the responsibility for doing a lot of work. The choice to reimport the old project's codebase into a non-distributed RCS vs. the slightly less painful branching command in a distributed RCS is irrelevant. Regardless, the two projects can still take advantage of the good work of the other project by easily merging the good portions while ignoring the bad.

Saturday, August 20, 2005

Distributed Revision Control No Threat, Pt. I

One of the things that Canonical is doing is providing bazaar imports of cvs and subversion archives. These imports (The SuperMirror & Bazaar Imports) give uses a way to try out Bazaar 1.x with projects that already have a long history. Goodies from all walks of life are present: GNU, Gnome, KDE, independant projects. This really gives user a great opportunity to try out Bazaar with existing projects and see how Bazaar meets their needs.

Unfortunately some people are a little worried that these imports give people the power to fork projects. The logic goes something like this: With a distributed revision control system any user can branch a project and not offer merges back into the mainline. Most developers want to protect our software against questionable design, poor workmanship and dangerous thoughts. This goal is an admirable one that most people should subscribe to and one I fully support. However, as with most things, the means one uses in order to justify the ends is just as important as the ends themselves.

Locking people out of revision control is not the cure to ensure the cohesiveness of a project. If you run certain revision control systems (cvs and subversion come to mind) you can dictate which people can and can't use revision control (and in which ways). This barrier to entry is sufficient to discourage some projects from forking into two seperate projects.

Using this sort of tactic is not the right way to ensure the cohesiveness of a project. This power play suffers the same consequences of all power plays: The best politicians win over the best meritocrats. The person that holds the control over the RCS gets to decide who has the ability to develop with the benefits of a RCS. Those that toe the line get favored, those that disagree are held back.

Several years ago Eric Raymond wrote a paper comparing the differences between two styles of development: the cathedral style and the bazaar style. Whether or not one agrees with Raymond's paper, one thing is certainly true: A non-distributed revision control system is the establishment of a cathedral type system. With the power of deciding which people can commit and which people can read archives comes the ability to enforce decisions about which code will be developed when and by whom.

Everybody gets to play in a distributed revision control system. Any person can take the official codebase, study it, learn from it, locally branch it and experiment with it. Each person can develop at their own rate, knowing that they can relatively easily merge in the latest and greatest from the official mainline. When their code is ready to be pulled in and supported by mainline the new developer can send a merge request to the official developers with code that is current with today's codebase changes.

So far we've established that controlling access to revision control is a power play. Some developers are willing to accept this cost to prevent the forking of the community. Tomorrow I'll explain the real factors preventing forking (hint: it has little to do with your choice of revision control system)