Principle #1 - Builds should be repeatable (Part 2)

In my previous blog entry, I presented the first of The Five Principles of Software Deployment, that is Builds should be repeatable.

In that post, I provided some advice on how to achieve repeatable builds. This post takes a look at the first 3 items in that list.

1. Use version control

To achieve a repeatable build, you essentially need to turn back the clock to the time when an application was previously built. It is impossible to do this without some kind of system that tracks changes to your files over time. Version control (or source code management) provides such a system.

If Maslow had created a hierarchy of needs for software developers, version control would be at the bottom of the pyramid, right there with food, water, air, and sleep! It should be considered essential to any software development efforts.

I’ve used version control throughout my career, whether working solo or on a large team. In some places, I’ve used commercial version control systems like Clearcase and PVCS. In other places, I’ve used open source systems like Subversion or CVS. I haven’t yet used version control systems like git, which are specifically designed for distributed projects.

Given the range the options and the fact that some of these options are free (not counting the time needed to set them up), there is no excuse for a developer not to use version control.

I won’t get into details here on exactly how version control works. This visual guide does a great job of that. However, I will discuss how to use version control to achieve repeatable builds.

2. Version your tools and dependencies

Nearly every application has dependencies on third-party libraries. In some cases, an application may be dependent on an library produced by another group. These third-party and in-house libraries will invariably change over the course of a project.

In order to repeat a build, you must capture the versions of the libraries used at the time of a build. The best way to do this is to add these libraries to version control, along with your source. As new versions of these libraries become available, they are added to version control alongside the previous versions. As a simple example, suppose your application depends on the Spring LDAP library. To move from 1.2 to 1.2.1 of this library, you could set up the following structure in your version control repository:

/ext/lib/spring-ldap/1.2/spring-ldap-1.2.jar
/ext/lib/spring-ldap/1.2.1/spring-ldap-1.2.1.jar

Within your build property files, you would change the reference to the spring-ldap library from 1.2 to 1.2.1.

If both the libraries and references to those libraries are versioned, you can easily go back and repeat a build with the correct versions of the libraries.

There are two important points that I’d like to make here.

First, this technique of versioning dependencies applies to libraries produced by some third party, either external or internal to your team. This does not include libraries that are generated from your own source code. If a library can be built from source code, it should never be checked into version control. Let me reiterate: never check in something that can be built from source. It is very easy for a checked-in library (or other build artifact) to get out of sync with the source. I’ve worked on a project where this was done, and it caused major headaches at build time.

Second, this versioning of libraries is area where I have a fundamental disagreement with the philosophy of the Maven build system. Maven, by default, is designed to download newer versions of dependendencies at build time into a local Maven repository. Also, by default, that Maven repository of libraries is not versioned. In fact, it’s usually stored in an invisible directory (.m2) within the user’s home directory. To me, library dependencies need to be managed more precisely in order to achieve reliable and repeatable builds.

To close the topic of putting dependencies into version control, I’ll mention the versioning of tools. It can be argued that to achieve a truly repeatable build, that the tools themselves should be versioned, including compilers and interpreters. In fact, this was exactly what we did on one project, where we checked in the version of Java that we used to build a release. In this case, there were additional reasons for doing this (we distributed the JRE with our release, and we changed some settings in Java’s network properties). But in the end, we had everything we needed to build right there in version control.

3. Label your releases

One feature common to all version control systems is the ability to label or tag a release. Effectively, this provides a virtual “snapshot” of the state of all files at the time a release is built. In some systems like Clearcase, a label is attached to files, while in others, like Subversion, a label (or tag) is presented as a separate ‘directory’ in the repository. In either case, both allow you to turn the clock back to the moment of a release and rebuild things as they were at that time.

One other technique for “labeling” that I’ve used in the past with some success, is to literally create a new set of directories for a major release by exporting the latest version of source and reimporting it under a separate directory (for example /prod/2.0). This is more extreme than just applying a 2.0 label or tag to the files, but it does have the advantage of easily allowing fixes to be applied to the 2.0 source without the need to create a branch.

Proper labeling of releases requires discipline. It is very easy to forget to relabel when last-minute changes are made. To combat this, it is best to “freeze” all changes at the time of build and designed a single person to handle the build process, including the final step of labeling. In the next post, I’ll discuss the idea of creating a separate build “user” for performing release builds.

0 comments ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment