Monday, August 22, 2005

Distributed Revision Control No Threat, Pt. II

In yesterday's diary I lined out an explanation of why choosing a non-distributed revision control system was a power play. Later I realized that some people would probably say to themselves "Yeah, but I'm not trying to do a power play. I'm just sticking with a tool that I know". Regardless of intent the end result is the same. Any developer that chooses a non-distributed revision control system has also made development more difficult for non-sanctioned developers, which I'll dive into more deeply some other day.

The big scary spider in the minds of many developers is that some day their project might be forked by somebody else. This fear, which is partially justified, presents with a variety of unhealthy behaviors: obfuscated code, poorly documented code, development plans happening in private backroom discussions rather than in public and centralizaton of control are only some of the tactics that some use in order to hold onto a project.

There are at least three scenarios that result in a potential forking situation: temporarily abandoned projects, divergance of interests and poorly managed projects. Each of these driving forces results in a different type of fork that can have different long term results for the project.

A temporarily abandoned project is one in which the developer has gone AWOL. Perhaps he's burned out, holding his coding hostage for pay or overcommitted in his work. For whatever the reason, the lead developer of the otherwise healthy project is no longer actually leading development. Nature, who abhors a vacuum, sees to it that eventually a new lead developer springs out of the ether. The old project lead, realizing that he's no longer in charge, rushes back to the scene and attempts to regain control. In most cases, the new developer steps back and allows the old lead to regain control of the project. In less common cases, especially if the old lead had limited management skills or had shown a previous history of abandonment, the new project lead will continue development against the wishes of the old developer, leading to two projects in the place where one previously existed.

In a non-distributed revision control system, the new lead has to setup a new revision control system, get the code freshly imported, setup accounts.... only then can he actually start pulling in the patches laying around from the various interests. This code is at risk of being lost when the old lead returns. To avoid this loss, the new lead will likely continue development parallel to the old lead, leading to an arms race.

Imagine if the old lead had been using a distributed RCS. The new lead would have branched from the old lead and continued development. The returning old lead would simply be able to merge the new lead's code and development would continue as if nothing had ever happened. No work is lost, no pride is at risk, and everyone eventually ends up happy. In the abandonment case a distributed revision control system actually reduces the chance of a long term fork.

In a succesful project the userbase typically grows substantially. Over time the project can realize that it is supporting two or more mutually exclusive goals. A project typically handles this by splitting off a subproject that focuses on the unique subgoals. The long term relationship between these two projects is highly dependant upon the relationship between the two lead developers; in many cases the break-up is mutually agreeable and the two parties work together on the interests that are common.

In a non-distributed RCS setup the divergance of the two projects is encouraged because merging becomes increasingly difficult. Eventually the two projects get tried of passing deltas back and forth and give up working together. In a revision control system the merging of specific patches from two distinct, but related, tree is very simple. This feature encourages more merging between projects and encourages both projects to merge the fixes for the similiar projects between one another. Thusly: Distributed revision control systems minimize divergance between two related projects.

Finally, we come to the case of the mismanaged project. I've often heard the following quote: "The internet routes around damage". In a mismanaged project several members of a project have gotten so fed up a project that they have decided to take on a good deal of additional work just to ensure the project's success. Nobody takes this task on lightly due to the high volume of non-development related work that goes with project leadership. Consider the following effort that typically needs to be expended to start a new/forked project:

  • A website to gather and disseminate information. The website needs to be well maintained in order to convince users to switch over to the new project.
  • An IRC channel needs to be manned for both developer discussions and user support. Again, this needs to be done well to encourage mindshare
  • A bug tracker for the new project is a must for any project of significant size. Be ready to spend anywhere from five to twenty hours a week triaging bugs.
  • A mailing list needs to be setup and the public informed that it exists.
  • Third party announcements (release announcements, freshmeat announcements, etc)
  • Conferences, speaking engagements, trolling user groups, etc may become a requirement.

    By the time a group of users is willing to hostily fork a project, the choice of RCS is moot. The user has already undertaken the responsibility for doing a lot of work. The choice to reimport the old project's codebase into a non-distributed RCS vs. the slightly less painful branching command in a distributed RCS is irrelevant. Regardless, the two projects can still take advantage of the good work of the other project by easily merging the good portions while ignoring the bad.


Post a Comment

<< Home