April 19th, 2008 — BundleWorks, Deployment, Environments
After a bit of a break to meet some client deadlines, I’d like to continue the series The Five Principles of Software Deployment. This entry discusses the fourth principle in that series: Releases should be easy to do and undo.
Doing a software release can be as simple as installing a new version of a single component on a single machine, or it can involve upgrading multiple components across multiple machines along with configuration and database changes. Undoing a software release, i.e. rolling back if the newly released software doesn’t work properly, can be a challenging, if not impossible, task.
This blog entry provides tips on how to make your software releases easier to do and undo.
Know your environments
The most important part of a software release is knowing your environments. You should be able to answer the following questions:
- What versions of the software (if any) are running in the target environments?
- Are the correct versions of all dependencies installed in the target environments?
- What OS versions are running on the machines and is the release compatible with these versions?
If you keep your environments consistent, it is easier to answer these questions.
Have proper backups
This point is especially important for being able to undo a release. You should ensure backups for the application itself, its configuration, any runtime files produced by the application, and any databases used by the application. Be skeptical of regular backups that are performed. I’ve seen cases where these backups were not being done properly or could not be restored. It is better to take a snapshot of all components at the time of release.
Document all release steps
Good documentation is essential to a good software release. Even if a release is completely automated, it is important to document the changes that will take place. This documentation must also include rollback steps. In a perfect world, rollback steps would not be necessary, but the world of software development is far from perfect.
Here are some of the things to include in release documentation:
- List of machines included in the release.
- List of components, with versions, to be released on each machine.
- All steps necessary to do the release.
- For larger releases, an hour-by-hour plan with time estimates and checkpoints
- All steps necessary to undo the release.
- List of tests to be performed to verify the release.
- List of people who need to sign off on the release.
- List of emergency contact names and phone numbers.
Automate the release to an appropriate level
Automation can help avoid human error during a release. I’m a big fan of automation. But there are some important points to remember about automation:
- Any code that automates a release is still code. This automation code must be tested thoroughly, including all use cases - installation, upgrade, and rollback.
- Automation is not a substitute for good documentation or for understanding exactly what will happen during a release. Blind reliance on automation can make diagnosing problems more difficult during a release.
- There is always a trade-off between the benefits gained from automation and the time spent to automate. An example would be writing and testing sophisticated database rollback code when a simple database dump and restore would do as well.
Test the release
You should have a staging environment where release steps can be tested before the final production release. Also, you should get someone else to test the release. Having another person follow your release steps is a great way to make sure that the release is documented properly.
Use BundleWorks
This tip is a shameless plug for our BundleWorks product. BundleWorks has features that help to make software releases easier to do and undo, including:
- Allowing multiple versions of an application to co-exist. This allows you to keep older versions around while you upgrading to newer versions, making it easier to roll back.
- Allowing multiple environments to co-exist on the same machine. This allows you to easily set up a staging environment for testing a release.
- Application management functions which allow you to see what versions are installed where.
- Standard hooks and a library of functions to help automate installing, upgrading, and rolling back applications.
If you want to give BundleWorks a try, you can download the latest version.
The last of the The Five Principles of Software Deployment, Failures should be obvious, will be covered in the next blog entry. Stay tuned.
February 22nd, 2008 — Tips
One of the requirements on a current project is to handle an FTP feed of XML data. Yes, for a protocol invented in the 70s, FTP is still a popular way of transporting data!
In processing an FTP feed, there’s always the question of how to check for files (usually with cron) and when to know that the FTP transfer is complete (using FTP rename is one approach to this).
Today, I learned about a Linux feature that makes it easier to process FTP feeds. In later versions of the Linux kernel (2.6.13 or later), a file change notification system named inotify is available. People have written a number of utilities to take advantage of inotify. One of these is the inotify-tools package.
The inotify-tools package includes a library and two command line programs, inotifywait and inotifywatch.
The inotifywait utility looked like it would work for me, so I downloaded the inotify-tools source, compiled it, and installed in on my Linux system.
I then wrote the following shell script to monitor for new files within the feed directory, and execute a program to process the file.
#!/bin/sh
FEED_DIR="/var/feeds/news/incoming"
while file=$(inotifywait -q -e close_write "$FEED_DIR" --format "%f"); do
exec /path/to/myapp "$FEED_DIR/$file"
done
This script quietly (-q) waits for the “close_write” event (-e) from the inotifywait call and outputs the name of the file (–format “%f”). In my testing, the “close_write” event is triggered when the FTP transfer is completed. For feeds that use FTP rename, the event would be ‘create’ instead of ‘close_write’. See the man page for inotifywait for more information about these and other events.
This approach to processing incoming FTP files has the benefit of not requiring cron to periodically check for new files. It also solves the problem of knowing when the FTP transfer is complete.
February 15th, 2008 — Deployment
This post discusses the third of The Five Principles of Software Deployment: Packages should be autonomous.
The meaning of autonomous is “existing or capable of existing independently.” In my experience, the more autonomous, or independent, a software package is, the easier it is to deploy and maintain.
What makes a software package autonomous? Here are some characteristics:
- You can run multiple instances of the software package independently.
- All runtime paths, like configuration and log files, are relative to the package or to a specified root.
- All logic for installing, uninstalling, starting, and stopping the software is contained within the package.
- The package includes as many dependencies as possible.
Let’s take a look at each of these in turn.
Running multiple instances
Running multiple instances of a software package on the same machine means two things. First, it means that you can install and run multiple versions of the package. Second, it means that you can run multiple instances of a particular version at the same time.
The ability to install and run multiple versions is important for testing, but also for easily rolling back to a previous version during a failed rollout. One example of a package that achieves this measure of independence is the Java Development Kit (or JDK). You can install multiple versions of the JDK side by side, and run each version separately (by adjusting your PATH).
To run multiple instances of the same version of an application at the same time, you need to make sure that the software does not rely on fixed ports, paths, or external resources. It is best to make these options configurable or relative to a specified location. In my previous post, I offer some tips for making sure that an application can run independently.
Relative runtime paths
Any files needed to run a package should be relative to the package or relative to a specified root. Files that do not change across multiple running instances can be kept relative to the package. For example, a properties file with localization information can be kept in a directory within the package. Files that vary on a “per instance” basis should be stored relative to a specified location.
One software package that does a particularly good job of this, is the Apache Tomcat servlet container. This software package relies on two environment variables, CATALINA_HOME and CATALINA_BASE, to find the directories and files that remain the same per instance (CATALINA_HOME) versus the directories and files (tmp, logs, etc.) that vary on a per instance basis (CATALINA_BASE). In this way, multiple instances of the same version of Tomcat can easily be run on the same machine (provided that they are configured to use different ports).
Logic within package
In many cases, a software package requires scripts to perform operations like starting, stopping, or checking status. These scripts should be included within the package itself, otherwise they end up being created as one-offs when the package is deployed to a particular machine.
In the past, I have used various conventions for storing these scripts. One option is to create a ‘bin’ directory inside the package to contain the scripts. This is the convention used by the Apache Tomcat servlet container distribution.
Currently, I use our own BundleWorks product for packaging and deploying my software. BundleWorks stores all package scripts inside a ‘bundle/actions’ directory. Regardless of what convention is used, the important thing is to keep all package ‘logic’ bundled with the software itself.
Including dependencies
To make a software package as autonomous as possible, it is best to include as many dependencies in the package as possible. Some examples of this are:
- Including third-party jar files inside a web application’s war file.
- Including third-party or custom libraries (.dll or .so files) with a package ‘lib’ directory.
- Compiling binaries as static to avoid depedendency on an external library.
- Including an entire runtime like the JRE or Ruby intepreter within the package itself.
These options can cost disk space and memory, but they help to make packages easier to deploy and maintain. Packaging software with its dependencies provides a measure of control over the components that an application uses.
Not all options are appropriate for all situations. However, as an example, in a previous consulting role, we chose to bundle the Java JRE with our software since we found that the installed version of Java varied drastically across the hundreds of boxes where our software needed to run. In some cases, the version of Java was so old, it wasn’t even sufficient to run the application!
The next post in the series on The Five Principles of Software Deployment, will take a look at principle #4: Releases should be easy to do and undo.
January 27th, 2008 — Deployment, Environments
In my previous post, I discussed the second of The Five Principles of Software Deployment, namely that Environments should be consistent. In that post, I looked at how developers could achieve consistency in their build environments. In this post, I’ll take a look at achieving consistency in other environments.
As I mentioned in the previous post, an environment includes all the components needed to build and run your application.
As an application is built, tested, and deployed, it generally passes through several environments. These environment can include:
- Developer Test environment - This is the place where a developer builds and tests an application. Typically this is an environment running on the developer’s workstation and is not shared by other team members.
- System Integration Test (SIT) environment - This is a more formal test environment for developers. This environment is used to test an application before it is released for others to use. On a large project, this environment is often integrated with environments from other projects.
- User Acceptance Test (UAT) environment - This environment allows end users to test an application in an environment that closely resembles the production environment. Often snapshots of production data are used to populate this environment to simulate actual usage.
- Production environment - This is the environment where the application is ultimately used. In some cases, this environment can span multiple machines for load balancing and redundancy.
- Disaster recovery (DR) environment - This is an environment that is ready to use if the production environment becomes unavailable.
Larger projects may contain additional environments, while smaller projects may only contain some of these. As a minimum, there should be at least one environment between the developer’s build environment and the ultimate production environment. Unfortunately, as I wrote about earlier, this is often not the case.
The goal is to have consistency between environments. The closer an environment is to the final production environment, the more important this consistency is. Achieving this consistency is not easy, as it involves resources, both human and financial, which are often in short supply on a project.
Here are some things that can help in achieving consistency throughout your deployment environments.
- Document your application dependencies - This was already mentioned in the previous post as a tip for achieving consistency in build environments, but it also applies to deployment environments. Armed with a list of the various third-party components (OS, databases, services, libraries) that an application needs, you can more easily set up a new environment or new machine to run the application.
- Package your applications for deployment - Creating complete packages for your application is an important method of achieving consistency across your environments. These packages can contain any libraries and frameworks needed to run the application. By including them along with the application, you are ensuring that the same version is used wherever the application runs. These packages can also include various scripts to install, upgrade, start, and stop your application. These scripts allow for greater control over the deploying and running of an application. It can also be very useful to package third-party applications to achieve the same level of control.
- Build your applications to run independently - It is often desirable or necessary to run multiple environments on a single machine. You need to consider this when building your applications. Some things to check for:
- Does your application open any network sockets? If so, you need to make port numbers configurable to allow multiple instances of your application to run on the same machine.
- Does your application use fixed paths? If these paths are absolute, there will be problems (unless you use chroot under UNIX). It is better to have your paths be relative to the application or rely on an environment variable.
- Does your application rely on a well-known directory like ‘/tmp’ on UNIX boxes? This can often cause subtle problems. In Java applications, you might need to change the Java property, java.io.tmpdir, to avoid problems.
- Consider virtualization - The use of virtualization tools like VMWare, Microsoft Virtual Server, and others can help in a number of ways. First, it allows you to run multiple independent environments on the same server. Second, it allows you to create a standard virtual machine and run this standard configuration on multiple servers. There are other benefits to virtualization which I’ll cover in a future post, when I discuss my own uses of virtualization.
- Track environment changes - Once you have an environment set up, it is very important to track any changes to that environment, and propagate these changes to other environments. For example, if you move to a newer version of Java on your production servers, you need to upgrade Java on any servers used for disaster recovery (DR). A good change management process is important.
In my next post, I’ll continue with the third of the Five Principles of Software Deployment, which is Packages should be autonomous. In the meantime, if you want to share your experiences with achieving consistent environments, feel free to post a comment here.
January 12th, 2008 — Build, Deployment, Environments
This post continues the series The Five Principles of Software Deployment. It covers Principle #2, Environments should be consistent.
An environment includes all the components needed to build and run your application. As an example, take a typical database-driven Java web application, which relies on the following components on the server side:
- Operating system (e.g. Solaris)
- System libraries (e.g. OpenSSL)
- Service providers (e.g. Apache and Oracle)
- Runtime environments (e.g. JDK or JRE)
- Application containers (e.g. Tomcat)
- Application frameworks and libraries (e.g Struts, Spring)
One of the keys to successful deployment is keeping components like these consistent throughout the environments used to build and run your application.
This post discusses consistency in build environments. The next post will look at maintaining consistency across deployment environments.
Having a consistent build environment means two things. First, a build environment should use the same components used to run the application in production. Second, a build environment needs to be consistent between developers.
In an ideal world, each developer would use the same versions of every component involved in building and running an application. Looking at the list above, this means running the same versions of everything from the underlying operating system to the application libraries.
This is not always possible or practical. For example, a developer may use a Windows-based workstation for development, while deploying to a Linux or UNIX-based environment. In this case, consistency can still be maintained further up the technology stack, by running the same version of the JDK, application server, and application libraries.
The following are some practical tips for achieving consistent build environments:
- Document your application dependencies - The first step to consistency is knowing exactly what is needed to build and run your application. This may seem obvious, but good project documentation is often overlooked. This documentation is especially important for new members that join the team.
- Version control as many components as possible - This tip is related into the advice presented in previous blog entries about achieving repeatable builds. At a minimum, check in application frameworks and libraries like Struts and Spring and update your build scripts when the versions of these components change. This will ensure that all developers are building with the same libraries.
- Provide a repository for component downloads - For those components that are not checked into version control, it is useful to have a shared drive or web server where team members can retrieve standard component versions. For example, this repository could contain the standard version of Eclipse used by the team.
- Develop a strategy for running multiple versions of tools - It is sometimes necessary to switch versions of tools during a project. One example is moving to a newer version of Java - say from 1.5 to 1.6. To make this change cleanly, it is best to have both versions of the component available to allow new development to occur with the new version, while keeping the old version available for running and debugging an older version of the application. Some components, like the Java JDK, allow side-by-side installation, while others are not as easy to set up this way. Our BundleWorks product is designed to allow multiple versions of components to run side-by-side.
- Build open-source components from source - The thought of building a component from source may make some people cringe, and although it can sometimes be painful, there are benefits to doing this. The first is that you know exactly what version of a component you are running across all environments, since you are not relying on the version provided by the OS vendor. An example of this is the Apache web server. Different operating system versions include different versions of Apache. Building from source gives you a known, consistent version. Second, building from source allows you to install multiple versions of a component side-by-side. This is done by passing a –prefix argument to the configure script when you build the component (e.g. –prefix=/opt/apache/2.2.6). If you decide to do this, check the original source and the build script (with configure options) into your source code repository.
- Provide common scripts for setting up a build environment - This tip applies more to UNIX-based operating systems than to Windows. When a user logs into a UNIX-based machine, startup scripts are run based on the user’s choice of shell. In a team environment, these startup scripts should rely on a common script to set up the build environment. This common script should set the standard versions of any tools needed (compilers, runtime environments, etc). Any group scripts should also be checked into version control, as they can change over time.
I’d like to make one final point about consistent build environments. This point involves Integrated Development Environments, or IDEs. I’ve used a number of IDEs over the course of my career and have found them to be helpful. Developers have differing opinions about which IDE, if any, is best. Deciding on a standard IDE for a team is not as important as standardizing versions of other build components. As long as the underlying components are consistent, a developer using vi and Ant can achieve the same end result as a developer using the latest and greatest version of IDEA.
January 4th, 2008 — Build
There are days when working with technology feels like getting your fingers caught in a closing car door. Yesterday was one of those days.
The goal seemed simple: re-compile PHP on a 64-bit CentOS 5.1 Linux host to add support for IMAP. This blog post shows exactly how simple it was[n’t].
As a rule, I like to compile my development tools from source. This allows me to standardize on a particular version of the tool and not be tied to the version available in the Linux distribution. Additionally, it lets me run multiple versions of a tool side-by-side (with the help of our BundleWorks product). For system-level libraries, I prefer to install them using the packaging system provided with the OS (apt-get, yum/rpm, etc.).
Anyway, I had already built Apache, MySQL, and PHP from source, when I discovered that I needed IMAP support in PHP. According to the PHP website, I needed to install the c-client IMAP library first.
Since I was running CentOS, I used
yum install libc-client-devel libc-client
to install the c-client library and the necessary headers. I then added –with-imap to the list of configure options.
This resulted in the following configure error:
This c-client library is built with Kerberos support.
Add --with-kerberos to your configure line. Check config.log for details
Fair enough. I added –with-kerberos to the configure command line.
This resulted in a new error:
error: Kerberos libraries not found.
Check the path given to --with-kerberos (if no path is given, searches in /usr/kerberos, /usr/local and /usr )
I checked my system, and found that the Kerberos libraries were installed in /usr/lib64. So I passed –with-kerberos=/usr/lib64 to the configure script, but the script still reported that the Kerberos libraries could not be found.
So I “googled the error message”, like any good programmer would do. After trying a couple of suggestions without any luck, I decided to do something which has helped me diagnose compile errors in the past:
sh -x ./configure ...configure options...
This provides debugging output for the configure script, by using the -x switch for the sh command (the configure script is shell script, after all). It also provides more useful information than you would normally find in the config.log file.
From the pages of output that filled my terminal, I found that the configure script was appending “lib” to the –with-kerberos path that I provided, so it was looking inside a non-existent “/usr/lib64/lib” directory. However, I found that I could change “lib” to “lib64″ by passing –with-libdir=lib64 to the configure script. Victory seemed imminent.
It was not imminent enough. The configure script now complained that it couldn’t find the MySQL client libraries in the location that I previously installed them. It was using this libdir value of “lib64″ to look inside the MySQL installation, and the libraries instead were in a directory named “lib”. Exasperated, I ended up creating a softlink named “lib64″ inside the MySQL installation to point to the “lib” directory.
The car door finally closed without catching my fingers. The configure and compile then ran successfully.
December 22nd, 2007 — Build, BundleWorks, Deployment
The old saying goes “Necessity is the mother of invention.” This saying was proved true yesterday.
I was contacted by a developer eager to use our BundleWorks product to help manage their build and deployment process. There was one problem - they built their application on Windows and deployed to AIX. The deployment part wasn’t a problem since BundleWorks supports a variety of UNIX-based platforms, including AIX. But the Windows version of BundleWorks is not ready yet.
A thought came to me. A BundleWorks bundle is simply a set of directories and files in well-defined locations, very similar to a Java jar or war archive. With the proper changes to their current ant build script, this developer could produce a BundleWorks bundle without needing BundleWorks on their Windows machine.
The developer needed to make the following changes accomplish this:
- Create BundleWorks action scripts to install and run their java application. These scripts ended up being a handful of lines long since they were able to rely on BundleWorks built-in functions.
- Change their ant build.xml file to arrange their build output to match the layout of a BundleWorks bundle.
- Use the ant echo task to write out a BundleWorks info file.
- Use the ant zip task to create an archive for deployment.
Note that these steps are normally performed by the BundleWorks bundle command, so when BundleWorks is available for Windows, the developer’s ant script becomes simpler.
The developer made these changes, built a BundleWorks bundle for their application, and successfully deployed it to their AIX staging environment. Next week, they will deploy the application into their production environment.
The developer said they were eager to use BundleWorks to replace an error-prone upgrade process. Previously, they would deploy upgrades by manually copying new jar files to replace old ones, and they sometimes ran into trouble doing this. Using BundleWorks, the upgrade process became clean and safe, with an option to roll back if necessary.
The net result: one happy developer.
December 19th, 2007 — Build, Version Control
This is the last in a series of posts discussing repeatable builds, which is one of The Five Principles of Software Deployment. This posts picks up where the previous one left off, presenting more advice on achieving repeatable builds.
4. Automate your build
According to the Joel Test, from respected author Joel Spolsky, one of the 12 steps to better code is being able to answer “yes” to question #2 “Can you make a build in one step?”.
I agree with this wholeheartedly, and I work toward achieving this goal on all my projects - large or small.
Depending on the languages and technologies used to build an application, there are a number of tools available to automate builds.
For most languages, the venerable make tool can be used to automate a build. General purpose build tools like scons are also available, which provide improvements over make.
There are also tools that are more specific to a particular language or technology like Ant, Maven, and Buildr (for Java), NAnt (for .NET), rake (for Ruby).
These tools can be used to provide automated, “one step” builds.
An automated build system should provide the following:
- A command line method for building. Building from an IDE is not recommended for achieving repeatable builds.
- A way to build dependencies as part of the build. Some tools like Maven, handle dependencies, while other tools have inspired third-party approaches to dependency managment (Ivy for Ant)
- A consistent method for building applications on a project. In one project that I worked on, we ended up using Ant to call “make” for the C/C++ portions of the project, so that every project could be built with a single ‘ant’ command.
- A way to use Continuous Integration servers. In a future post, I’ll discuss Continous Integration, but having an automated build is a necessary first step to using a CI server.
For repeatability, having an automated build system is key and worth the effort it takes to set up.
5. Create a separate build user
This is one of the most important practices for achieving a repeatable build.
As developers work on a project, their local environments invariably end up collecting files and settings that allow them, but no one else, to build an application properly. Sometimes these local differences are subtle, and an application will build fine for other team members, but will not work in exactly the same way. These local differences make it impossible to achieve a truly repeatable build.
Creating a separate build user and applying strict control over this user provides a neutral third-party to settle these “build differences”. The approach is to create a separate user both on the host operating system and in the version control system. On UNIX, this can be accomplish with a shared user that team members can ‘su‘ into.
Building a release is then done as follows:
1. Login or switch to the build user.
2. Delete all files from the previous build.
3. Checkout all files from version control that are needed for the build.
4. Build the release.
By removing all previous build files and checking out a fresh set of files, you ensure that all files have been checked in and that no local differences will impact the build. After a successful build, this “build user” would then label or tag the files in version control.
On one of my previous consulting jobs, this approach was taken a step further. In this organization, a separate team was created to perform builds that were released into production. As developers, we would make sure that everything was checked into version control and would then notify this group that we were ready for a release build. At times, this process seemed like overkill, but in the end, I think the advantages outweighed the disadvantages.
6. Document your build process
No matter how simple or complex your build process is, the steps need to be documented. It is a fact of life that people come and go over the course of a project. Without good project documentation, it is difficult for new team members to learn the correct way to build and deploy applications. On a project with a complex build process, this can be a serious problem.
Over the years, I’ve encountered a variety of approaches to documenting the build process. These include:
- Storytelling (project elders passing on the “lore of the build”)
- Document management systems
- Team or organization-wide wikis
- Project documents in version control
Each of these has particular advantages and disadvantages (although I would be hard pressed to find too many advantages to the Storytelling approach).
I tend to favor an approach that stores documentation inside the version control system. This documentation can then be labeled or tagged when a release is done. If the build process and its documentation changes later, the state of the documentation is still preserved at the time of the release.
I also tend to favor plain text documentation over binary formats like Word. Plain text files can be opened by any text editor, from vi to Eclipse. Text files are easier to diff and easier to search. Even for larger documents, I prefer text-based formats like Markdown, XHTML/CSS, and even Docbook.
Regardless of the tools used to document the build process, have good documentation for the build process is essential to achieving repeatable builds over time.
Next up: Principle #2 of the Five Principles of Software Development - Environments should be consistent.
December 13th, 2007 — Build, Version Control
In my previous blog entry, I presented the first of The Five Principles of Software Deployment, that is Builds should be repeatable.
In that post, I provided some advice on how to achieve repeatable builds. This post takes a look at the first 3 items in that list.
1. Use version control
To achieve a repeatable build, you essentially need to turn back the clock to the time when an application was previously built. It is impossible to do this without some kind of system that tracks changes to your files over time. Version control (or source code management) provides such a system.
If Maslow had created a hierarchy of needs for software developers, version control would be at the bottom of the pyramid, right there with food, water, air, and sleep! It should be considered essential to any software development efforts.
I’ve used version control throughout my career, whether working solo or on a large team. In some places, I’ve used commercial version control systems like Clearcase and PVCS. In other places, I’ve used open source systems like Subversion or CVS. I haven’t yet used version control systems like git, which are specifically designed for distributed projects.
Given the range the options and the fact that some of these options are free (not counting the time needed to set them up), there is no excuse for a developer not to use version control.
I won’t get into details here on exactly how version control works. This visual guide does a great job of that. However, I will discuss how to use version control to achieve repeatable builds.
2. Version your tools and dependencies
Nearly every application has dependencies on third-party libraries. In some cases, an application may be dependent on an library produced by another group. These third-party and in-house libraries will invariably change over the course of a project.
In order to repeat a build, you must capture the versions of the libraries used at the time of a build. The best way to do this is to add these libraries to version control, along with your source. As new versions of these libraries become available, they are added to version control alongside the previous versions. As a simple example, suppose your application depends on the Spring LDAP library. To move from 1.2 to 1.2.1 of this library, you could set up the following structure in your version control repository:
/ext/lib/spring-ldap/1.2/spring-ldap-1.2.jar
/ext/lib/spring-ldap/1.2.1/spring-ldap-1.2.1.jar
Within your build property files, you would change the reference to the spring-ldap library from 1.2 to 1.2.1.
If both the libraries and references to those libraries are versioned, you can easily go back and repeat a build with the correct versions of the libraries.
There are two important points that I’d like to make here.
First, this technique of versioning dependencies applies to libraries produced by some third party, either external or internal to your team. This does not include libraries that are generated from your own source code. If a library can be built from source code, it should never be checked into version control. Let me reiterate: never check in something that can be built from source. It is very easy for a checked-in library (or other build artifact) to get out of sync with the source. I’ve worked on a project where this was done, and it caused major headaches at build time.
Second, this versioning of libraries is area where I have a fundamental disagreement with the philosophy of the Maven build system. Maven, by default, is designed to download newer versions of dependendencies at build time into a local Maven repository. Also, by default, that Maven repository of libraries is not versioned. In fact, it’s usually stored in an invisible directory (.m2) within the user’s home directory. To me, library dependencies need to be managed more precisely in order to achieve reliable and repeatable builds.
To close the topic of putting dependencies into version control, I’ll mention the versioning of tools. It can be argued that to achieve a truly repeatable build, that the tools themselves should be versioned, including compilers and interpreters. In fact, this was exactly what we did on one project, where we checked in the version of Java that we used to build a release. In this case, there were additional reasons for doing this (we distributed the JRE with our release, and we changed some settings in Java’s network properties). But in the end, we had everything we needed to build right there in version control.
3. Label your releases
One feature common to all version control systems is the ability to label or tag a release. Effectively, this provides a virtual “snapshot” of the state of all files at the time a release is built. In some systems like Clearcase, a label is attached to files, while in others, like Subversion, a label (or tag) is presented as a separate ‘directory’ in the repository. In either case, both allow you to turn the clock back to the moment of a release and rebuild things as they were at that time.
One other technique for “labeling” that I’ve used in the past with some success, is to literally create a new set of directories for a major release by exporting the latest version of source and reimporting it under a separate directory (for example /prod/2.0). This is more extreme than just applying a 2.0 label or tag to the files, but it does have the advantage of easily allowing fixes to be applied to the 2.0 source without the need to create a branch.
Proper labeling of releases requires discipline. It is very easy to forget to relabel when last-minute changes are made. To combat this, it is best to “freeze” all changes at the time of build and designed a single person to handle the build process, including the final step of labeling. In the next post, I’ll discuss the idea of creating a separate build “user” for performing release builds.
December 11th, 2007 — Build
The first of The Five Principles of Software Deployment is that Builds should be repeatable.
This means that for any application that you build and release, you should be able to rebuild that application exactly as it was built at the time of release at any time in the future.
If you keep copies of all applications that you release, why would you need to repeat the build process for one of them? The answer lies in the fact that all software has bugs. And if someone discovers a nasty bug in a version of your application running in production, you may be called upon not only to reproduce that bug, but more importantly to release a version of the application whose only change is the bug fix. The ability to go back and repeat the conditions of the original build is essential in this case.
This is not a hypothetical scenario in my experience. It has happened on a number of systems that I’ve worked on over the years. A good manager will want to minimize risk whenever possible, and applying a spot fix to solve a production problem is one way to do this.
So how do you achieve repeatable builds? The following list provides some guidance on how to add repeatability to your build process.
1. Use version control.
2. Version your tools and dependencies.
3. Label your releases.
4. Automate your build.
5. Create a separate build user.
6. Document your build process.
The items in this list will be covered in greater detail in upcoming posts. In the meantime, let me know if you think anything should be added to this list.