One of the requirements on a current project is to handle an FTP feed of XML data. Yes, for a protocol invented in the 70s, FTP is still a popular way of transporting data!
In processing an FTP feed, there’s always the question of how to check for files (usually with cron) and when to know that the FTP transfer is complete (using FTP rename is one approach to this).
Today, I learned about a Linux feature that makes it easier to process FTP feeds. In later versions of the Linux kernel (2.6.13 or later), a file change notification system named inotify is available. People have written a number of utilities to take advantage of inotify. One of these is the inotify-tools package.
The inotify-tools package includes a library and two command line programs, inotifywait and inotifywatch.
The inotifywait utility looked like it would work for me, so I downloaded the inotify-tools source, compiled it, and installed in on my Linux system.
I then wrote the following shell script to monitor for new files within the feed directory, and execute a program to process the file.
#!/bin/sh
FEED_DIR="/var/feeds/news/incoming"
while file=$(inotifywait -q -e close_write "$FEED_DIR" --format "%f"); do
exec /path/to/myapp "$FEED_DIR/$file"
done
This script quietly (-q) waits for the “close_write” event (-e) from the inotifywait call and outputs the name of the file (–format “%f”). In my testing, the “close_write” event is triggered when the FTP transfer is completed. For feeds that use FTP rename, the event would be ‘create’ instead of ‘close_write’. See the man page for inotifywait for more information about these and other events.
This approach to processing incoming FTP files has the benefit of not requiring cron to periodically check for new files. It also solves the problem of knowing when the FTP transfer is complete.
This post discusses the third of The Five Principles of Software Deployment: Packages should be autonomous.
The meaning of autonomous is “existing or capable of existing independently.” In my experience, the more autonomous, or independent, a software package is, the easier it is to deploy and maintain.
What makes a software package autonomous? Here are some characteristics:
- You can run multiple instances of the software package independently.
- All runtime paths, like configuration and log files, are relative to the package or to a specified root.
- All logic for installing, uninstalling, starting, and stopping the software is contained within the package.
- The package includes as many dependencies as possible.
Let’s take a look at each of these in turn.
Running multiple instances
Running multiple instances of a software package on the same machine means two things. First, it means that you can install and run multiple versions of the package. Second, it means that you can run multiple instances of a particular version at the same time.
The ability to install and run multiple versions is important for testing, but also for easily rolling back to a previous version during a failed rollout. One example of a package that achieves this measure of independence is the Java Development Kit (or JDK). You can install multiple versions of the JDK side by side, and run each version separately (by adjusting your PATH).
To run multiple instances of the same version of an application at the same time, you need to make sure that the software does not rely on fixed ports, paths, or external resources. It is best to make these options configurable or relative to a specified location. In my previous post, I offer some tips for making sure that an application can run independently.
Relative runtime paths
Any files needed to run a package should be relative to the package or relative to a specified root. Files that do not change across multiple running instances can be kept relative to the package. For example, a properties file with localization information can be kept in a directory within the package. Files that vary on a “per instance” basis should be stored relative to a specified location.
One software package that does a particularly good job of this, is the Apache Tomcat servlet container. This software package relies on two environment variables, CATALINA_HOME and CATALINA_BASE, to find the directories and files that remain the same per instance (CATALINA_HOME) versus the directories and files (tmp, logs, etc.) that vary on a per instance basis (CATALINA_BASE). In this way, multiple instances of the same version of Tomcat can easily be run on the same machine (provided that they are configured to use different ports).
Logic within package
In many cases, a software package requires scripts to perform operations like starting, stopping, or checking status. These scripts should be included within the package itself, otherwise they end up being created as one-offs when the package is deployed to a particular machine.
In the past, I have used various conventions for storing these scripts. One option is to create a ‘bin’ directory inside the package to contain the scripts. This is the convention used by the Apache Tomcat servlet container distribution.
Currently, I use our own BundleWorks product for packaging and deploying my software. BundleWorks stores all package scripts inside a ‘bundle/actions’ directory. Regardless of what convention is used, the important thing is to keep all package ‘logic’ bundled with the software itself.
Including dependencies
To make a software package as autonomous as possible, it is best to include as many dependencies in the package as possible. Some examples of this are:
- Including third-party jar files inside a web application’s war file.
- Including third-party or custom libraries (.dll or .so files) with a package ‘lib’ directory.
- Compiling binaries as static to avoid depedendency on an external library.
- Including an entire runtime like the JRE or Ruby intepreter within the package itself.
These options can cost disk space and memory, but they help to make packages easier to deploy and maintain. Packaging software with its dependencies provides a measure of control over the components that an application uses.
Not all options are appropriate for all situations. However, as an example, in a previous consulting role, we chose to bundle the Java JRE with our software since we found that the installed version of Java varied drastically across the hundreds of boxes where our software needed to run. In some cases, the version of Java was so old, it wasn’t even sufficient to run the application!
The next post in the series on The Five Principles of Software Deployment, will take a look at principle #4: Releases should be easy to do and undo.