Wednesday, 22 June, 2005

Entropy in source code control

One definition of entropy is, "Inevitable and steady deterioration of a system or society."  We've all seen it: absent constant maintenance, systems move from a state of order to a state of disorder.  Weeds grow in the flower garden and your desk becomes cluttered and disorganized.  The same thing happens with software systems' code.  At the start of a project everything is clean.  The directory structure is nicely laid out and the source code repository exactly mirrors the code on the disk.  Formal frequent (ideally, daily) build procedures ensure that the source control system remains in sync with what the developers are doing.

But at some point the project goes into "crunch" mode.  Either the team gets behind schedule or has to rush to get a bug fix out the door for a critical client or magazine review.  Maybe the product ships and a year later a lone developer hacks in some changes quickly and doesn't follow all the formal procedures to maintain the fidelity of the source code control system.  At some point, the project gets out of sync.  Six years later another lone developer picks up the code and tries to puzzle it all out.

Source code control systems like Microsoft's Visual Source Safe and the open source CVS (Concurrent Versions System) serve three primary purposes:

  1. They serve as a central repository for the project's source code.
  2. They maintain a revision history so that it's possible to retrieve all versions of the code and view every change made from the project's inception up to the most recent version.
  3. They control access to the source code, ensuring that only authorized users can read or update the code base, and that changes are recorded in the proper order (i.e. that older versions don't overwrite newer versions).

However, no version control system that I've used can prevent users from subverting it.  It's possible to check in files that aren't used in the project, and to use files without checking them in to the database.  Everything works fine until somebody decides to grab the latest version from the database and try to build the project.  There's no way for the system to enforce the rules, and no way short of trying to build the project to prove that the rules have been followed.  Maintaining a project's source integrity is requires active thought by the team members all the time.

Microsoft Visual Studio and Visual Studio .NET, and some other development environments have varying degrees of integration with version control systems.  These integrated systems work well as long as everybody follows the rules.  The problem is that the rules aren't precisely defined, they're hard to follow, and they're absurdly easy to break unintentionally.

The only way to ensure that your project will build successfully at any time is to create and maintain a daily build procedure that gets the latest version of the code from the repository into a clean directory structure and builds the entire project.  Every programmer on the project is notified of the build status every day.  This technique has been proven many times over the years, and is recommended by any project management book or seminar produced in the last five years.  Martin Fowler calls it Continuous Integration.  For a more friendly discussion of the topic and links to other resources, see Joel Spolsky's Daily Builds are Your Friend.

I'd be willing to bet that almost all successful large software systems use a similar technique.  I'd also be willing to bet that most unsuccessful large systems can point to the lack of a daily build process as a major contributer to the project's failure.

Here's the kicker, though.  A daily build process will ensure that you can build your project, and automated testing can ensure that the built version actually works.  But there does not appear to be a way to ensure that all the rules are followed and that the project file remains in sync with version control.  It's possible to add files to version control without adding them to the project file, and as long as your daily build procedure pulls down the entire source tree, you'll never know it.  The only way you can ensure that the project and the source code control stay in sync is to manually open the project from the version control into a clean directory.  And that isn't going to happen every day.  Or even every month.  Instant entropy.

Daily builds will keep your project on schedule.  Build early and build often.  But no automated tool is going to prevent entropy in your project's structure.  That's just the way it is.  It's a dirty little secret that most programmers either don't recognize or prefer not to discuss.