Entries from December 2007 ↓

Necessity and Invention

The old saying goes “Necessity is the mother of invention.” This saying was proved true yesterday.

I was contacted by a developer eager to use our BundleWorks product to help manage their build and deployment process. There was one problem - they built their application on Windows and deployed to AIX. The deployment part wasn’t a problem since BundleWorks supports a variety of UNIX-based platforms, including AIX. But the Windows version of BundleWorks is not ready yet.

A thought came to me. A BundleWorks bundle is simply a set of directories and files in well-defined locations, very similar to a Java jar or war archive. With the proper changes to their current ant build script, this developer could produce a BundleWorks bundle without needing BundleWorks on their Windows machine.

The developer needed to make the following changes accomplish this:

  1. Create BundleWorks action scripts to install and run their java application. These scripts ended up being a handful of lines long since they were able to rely on BundleWorks built-in functions.
  2. Change their ant build.xml file to arrange their build output to match the layout of a BundleWorks bundle.
  3. Use the ant echo task to write out a BundleWorks info file.
  4. Use the ant zip task to create an archive for deployment.

Note that these steps are normally performed by the BundleWorks bundle command, so when BundleWorks is available for Windows, the developer’s ant script becomes simpler.

The developer made these changes, built a BundleWorks bundle for their application, and successfully deployed it to their AIX staging environment. Next week, they will deploy the application into their production environment.

The developer said they were eager to use BundleWorks to replace an error-prone upgrade process. Previously, they would deploy upgrades by manually copying new jar files to replace old ones, and they sometimes ran into trouble doing this. Using BundleWorks, the upgrade process became clean and safe, with an option to roll back if necessary.

The net result: one happy developer.

Principle #1 - Builds should be repeatable (Part 3)

This is the last in a series of posts discussing repeatable builds, which is one of The Five Principles of Software Deployment. This posts picks up where the previous one left off, presenting more advice on achieving repeatable builds.

4. Automate your build

According to the Joel Test, from respected author Joel Spolsky, one of the 12 steps to better code is being able to answer “yes” to question #2 “Can you make a build in one step?”.

I agree with this wholeheartedly, and I work toward achieving this goal on all my projects - large or small.

Depending on the languages and technologies used to build an application, there are a number of tools available to automate builds.

For most languages, the venerable make tool can be used to automate a build. General purpose build tools like scons are also available, which provide improvements over make.

There are also tools that are more specific to a particular language or technology like Ant, Maven, and Buildr (for Java), NAnt (for .NET), rake (for Ruby).

These tools can be used to provide automated, “one step” builds.

An automated build system should provide the following:

  • A command line method for building. Building from an IDE is not recommended for achieving repeatable builds.
  • A way to build dependencies as part of the build. Some tools like Maven, handle dependencies, while other tools have inspired third-party approaches to dependency managment (Ivy for Ant)
  • A consistent method for building applications on a project. In one project that I worked on, we ended up using Ant to call “make” for the C/C++ portions of the project, so that every project could be built with a single ‘ant’ command.
  • A way to use Continuous Integration servers. In a future post, I’ll discuss Continous Integration, but having an automated build is a necessary first step to using a CI server.

For repeatability, having an automated build system is key and worth the effort it takes to set up.

5. Create a separate build user

This is one of the most important practices for achieving a repeatable build.

As developers work on a project, their local environments invariably end up collecting files and settings that allow them, but no one else, to build an application properly. Sometimes these local differences are subtle, and an application will build fine for other team members, but will not work in exactly the same way. These local differences make it impossible to achieve a truly repeatable build.

Creating a separate build user and applying strict control over this user provides a neutral third-party to settle these “build differences”. The approach is to create a separate user both on the host operating system and in the version control system. On UNIX, this can be accomplish with a shared user that team members can ‘su‘ into.

Building a release is then done as follows:

1. Login or switch to the build user.
2. Delete all files from the previous build.
3. Checkout all files from version control that are needed for the build.
4. Build the release.

By removing all previous build files and checking out a fresh set of files, you ensure that all files have been checked in and that no local differences will impact the build. After a successful build, this “build user” would then label or tag the files in version control.

On one of my previous consulting jobs, this approach was taken a step further. In this organization, a separate team was created to perform builds that were released into production. As developers, we would make sure that everything was checked into version control and would then notify this group that we were ready for a release build. At times, this process seemed like overkill, but in the end, I think the advantages outweighed the disadvantages.

6. Document your build process

No matter how simple or complex your build process is, the steps need to be documented. It is a fact of life that people come and go over the course of a project. Without good project documentation, it is difficult for new team members to learn the correct way to build and deploy applications. On a project with a complex build process, this can be a serious problem.

Over the years, I’ve encountered a variety of approaches to documenting the build process. These include:

  • Storytelling (project elders passing on the “lore of the build”)
  • Document management systems
  • Team or organization-wide wikis
  • Project documents in version control

Each of these has particular advantages and disadvantages (although I would be hard pressed to find too many advantages to the Storytelling approach).

I tend to favor an approach that stores documentation inside the version control system. This documentation can then be labeled or tagged when a release is done. If the build process and its documentation changes later, the state of the documentation is still preserved at the time of the release.

I also tend to favor plain text documentation over binary formats like Word. Plain text files can be opened by any text editor, from vi to Eclipse. Text files are easier to diff and easier to search. Even for larger documents, I prefer text-based formats like Markdown, XHTML/CSS, and even Docbook.

Regardless of the tools used to document the build process, have good documentation for the build process is essential to achieving repeatable builds over time.

Next up: Principle #2 of the Five Principles of Software Development - Environments should be consistent.

Principle #1 - Builds should be repeatable (Part 2)

In my previous blog entry, I presented the first of The Five Principles of Software Deployment, that is Builds should be repeatable.

In that post, I provided some advice on how to achieve repeatable builds. This post takes a look at the first 3 items in that list.

1. Use version control

To achieve a repeatable build, you essentially need to turn back the clock to the time when an application was previously built. It is impossible to do this without some kind of system that tracks changes to your files over time. Version control (or source code management) provides such a system.

If Maslow had created a hierarchy of needs for software developers, version control would be at the bottom of the pyramid, right there with food, water, air, and sleep! It should be considered essential to any software development efforts.

I’ve used version control throughout my career, whether working solo or on a large team. In some places, I’ve used commercial version control systems like Clearcase and PVCS. In other places, I’ve used open source systems like Subversion or CVS. I haven’t yet used version control systems like git, which are specifically designed for distributed projects.

Given the range the options and the fact that some of these options are free (not counting the time needed to set them up), there is no excuse for a developer not to use version control.

I won’t get into details here on exactly how version control works. This visual guide does a great job of that. However, I will discuss how to use version control to achieve repeatable builds.

2. Version your tools and dependencies

Nearly every application has dependencies on third-party libraries. In some cases, an application may be dependent on an library produced by another group. These third-party and in-house libraries will invariably change over the course of a project.

In order to repeat a build, you must capture the versions of the libraries used at the time of a build. The best way to do this is to add these libraries to version control, along with your source. As new versions of these libraries become available, they are added to version control alongside the previous versions. As a simple example, suppose your application depends on the Spring LDAP library. To move from 1.2 to 1.2.1 of this library, you could set up the following structure in your version control repository:

/ext/lib/spring-ldap/1.2/spring-ldap-1.2.jar
/ext/lib/spring-ldap/1.2.1/spring-ldap-1.2.1.jar

Within your build property files, you would change the reference to the spring-ldap library from 1.2 to 1.2.1.

If both the libraries and references to those libraries are versioned, you can easily go back and repeat a build with the correct versions of the libraries.

There are two important points that I’d like to make here.

First, this technique of versioning dependencies applies to libraries produced by some third party, either external or internal to your team. This does not include libraries that are generated from your own source code. If a library can be built from source code, it should never be checked into version control. Let me reiterate: never check in something that can be built from source. It is very easy for a checked-in library (or other build artifact) to get out of sync with the source. I’ve worked on a project where this was done, and it caused major headaches at build time.

Second, this versioning of libraries is area where I have a fundamental disagreement with the philosophy of the Maven build system. Maven, by default, is designed to download newer versions of dependendencies at build time into a local Maven repository. Also, by default, that Maven repository of libraries is not versioned. In fact, it’s usually stored in an invisible directory (.m2) within the user’s home directory. To me, library dependencies need to be managed more precisely in order to achieve reliable and repeatable builds.

To close the topic of putting dependencies into version control, I’ll mention the versioning of tools. It can be argued that to achieve a truly repeatable build, that the tools themselves should be versioned, including compilers and interpreters. In fact, this was exactly what we did on one project, where we checked in the version of Java that we used to build a release. In this case, there were additional reasons for doing this (we distributed the JRE with our release, and we changed some settings in Java’s network properties). But in the end, we had everything we needed to build right there in version control.

3. Label your releases

One feature common to all version control systems is the ability to label or tag a release. Effectively, this provides a virtual “snapshot” of the state of all files at the time a release is built. In some systems like Clearcase, a label is attached to files, while in others, like Subversion, a label (or tag) is presented as a separate ‘directory’ in the repository. In either case, both allow you to turn the clock back to the moment of a release and rebuild things as they were at that time.

One other technique for “labeling” that I’ve used in the past with some success, is to literally create a new set of directories for a major release by exporting the latest version of source and reimporting it under a separate directory (for example /prod/2.0). This is more extreme than just applying a 2.0 label or tag to the files, but it does have the advantage of easily allowing fixes to be applied to the 2.0 source without the need to create a branch.

Proper labeling of releases requires discipline. It is very easy to forget to relabel when last-minute changes are made. To combat this, it is best to “freeze” all changes at the time of build and designed a single person to handle the build process, including the final step of labeling. In the next post, I’ll discuss the idea of creating a separate build “user” for performing release builds.

Principle #1 - Builds should be repeatable

The first of The Five Principles of Software Deployment is that Builds should be repeatable.

This means that for any application that you build and release, you should be able to rebuild that application exactly as it was built at the time of release at any time in the future.

If you keep copies of all applications that you release, why would you need to repeat the build process for one of them? The answer lies in the fact that all software has bugs. And if someone discovers a nasty bug in a version of your application running in production, you may be called upon not only to reproduce that bug, but more importantly to release a version of the application whose only change is the bug fix. The ability to go back and repeat the conditions of the original build is essential in this case.

This is not a hypothetical scenario in my experience. It has happened on a number of systems that I’ve worked on over the years. A good manager will want to minimize risk whenever possible, and applying a spot fix to solve a production problem is one way to do this.

So how do you achieve repeatable builds? The following list provides some guidance on how to add repeatability to your build process.

1. Use version control.
2. Version your tools and dependencies.
3. Label your releases.
4. Automate your build.
5. Create a separate build user.
6. Document your build process.

The items in this list will be covered in greater detail in upcoming posts. In the meantime, let me know if you think anything should be added to this list.

Setting up Drupal

Yesterday I needed to set up a test installation of the Drupal content management platform for an upcoming project. It was only natural that I used our BundleWorks product to set up this Drupal test environment.

One of the main features of BundleWorks is that it allows you to create multiple independent environments on a single server and run multiple versions of an application in these environments. For now, I was only running one version of Drupal within a single environment, but I will eventually need to have Drupal (perhaps different versions) running in multiple environments for development and staging.

Drupal is written in PHP and is usually run with the Apache web server and MySQL database server. These three technologies make up the A-M-P in the LAMP technology stack (with Linux being the L).

BundleWorks supports LAMP technologies out of the box. It has “bundlers” specifically for creating Apache and MySQL bundles. It also has a bundler that can create bundles from source, which I used to create a PHP bundle. BundleWorks supports a variety of Linux versions, including the Ubuntu server that I use here.

I’ve posted the complete experience inside the BundleWorks forums for anyone interested in the details, but by the time I was done, I was able to run my Drupal test environment using the following commands:

use cms
mysqlserver start
apache start

I was also in a position to very easily create additional, independent Drupal environments on the same server. In the end, the effort to set up Drupal with BundleWorks was well worth it.

Five Principles of Software Deployment

In yesterday’s post, 2 out of 3 ain’t bad?!?, I wrote about a survey which suggested that 2 out of 3 developers don’t have a staging environment.

In my opinion, this is a violation of one of the basic principles of software deployment - Environments should be consistent.

In this post, I present my Five Principles of Software Deployment. Over the next couple of weeks, I’ll take a closer look at each one and provide advice on how they can be achieved.

The Five Principles of Software Deployment

1. Builds should be repeatable
2. Environments should be consistent
3. Packages should be autonomous
4. Releases should be easy to do and undo
5. Failures should be obvious

Stay tuned for more details.

2 out of 3 ain’t bad?!?

I saw a statistic today that really surprised me.

In a survey taken earlier this year on software testing, 66% of respondents stated that they have no staging environment (emphasis mine).

A staging environment is designed to mimic a production environment as closely as possible. It’s the place where an application is (or should be) tested before it goes into production. Unless people misunderstood the question, the results of the survey indicate that two in three developers do not test their applications in an environment that resembles production!

I guess I shouldn’t be surprised. I’ve worked in places where production applications ran from machines in unusual places. In one case, a machine running in the developer area was turned off during a move, only to have some production users scream that their application was down! In another case, a vital system was running under a reception desk!

I’m sure my experiences are not unique.

These days, with cheaper hardware and technologies like virtualization, there are fewer excuses for not having a staging environment.

One excuse is time. It takes time to set up and manage a separate environment. But it’s well worth the time if it makes production rollouts smoother.

Another excuse is that it is often not easy to set up multiple independent environments on the same machine. For some applications (web servers, databases, etc.) it is difficult to run multiple instances on a single server. (This, by the way, is one of the reasons I decided to develop BundleWorks.)

I would be interested in hearing from developers about the importance they place on a staging environment, and what tools and techniques people have developed to set up and maintain such an environment.

Welcome

Welcome to the Sauers Technologies blog.

At Sauers Technologies, we are passionate about software development. We have a particular interest in tools to help build, test, deploy, and maintain software. In fact, we have written one of these tools ourselves. It is called BundleWorks, and we are just now releasing it as a public beta.

This blog will not just be about BundleWorks, however. It will cover a wide range of software development topics. We have experienced a lot in our years in the business, and we hope to be able to share some of this experience.

Over the next couple of weeks, we will kick off the blog with a set of entries entitled The Five Principles of Software Development. These entries will present some basic principles of software deployment learned over the years, sometimes the hard way.