An adventure in agile administration: Linux Long Term Support and why enterprises think they need it

Disclosure: I have worked in various enterprises (banking and telcos) for the last 10 years or so as a UNIX Admin, the last 5 in a Bank.

The guys at the Food Fight Show recently had a discussion called "Distro Dancing" where they discussed their opinions on various Linux distros, which one they used first and which ones they like now and why they find them useful. It was interesting to see how people's needs developed over time with the development of Linux as a "play thing" to Linux as an OS that powers many of today's data centres and how that impacts people's requirements.

The main gist was that if you are using tools like chef or puppet, then you end up caring less about the distros because all of the functionality for configuration is abstracted behind the respective configuration tools and you are not tied to a particular distro's administrative interfaces. While I agree with this at a high-level there are some added extras that particular distros bring to the table such as their associated software repositories. The ability to "yum install" or "apt-get install" a particular application or library without the need for customized repos and packaging etc is quite useful.

John Vincent (@lusis) made a very insightful series of comments around the side effects of distros such as RHEL and Debian/Ubuntu and their Long Term Support implementations. Long story short, the Red Hat guys lock their software major versions down about three years before the software is actually released. This means that by the time the software is ready for general availability, it is already behind the state of the art in terms of updated packages. The example given was RHEL locking the system version of Python at 2.4 for their system tools where 2.7 is current and 2.6 still very popular and the impact that this has on people wanting a more up to date version of Python to work with.

The question that I had for myself was is why do enterprises require operating systems that are Long Term Supported? By that I am not only referring to the length of the support contract although that does play a part, but rather some of the restrictions around major version lockdowns etc. Here are some that came off the top of my head (in no particular order)

Enterprises use software from major software producers (Oracle, Sybase, Weblogic etc.). Version lockdowns allow those software producers to release software that will work with a known OS configuration. By keeping their versions of the underlying system libraries (glibc etc) stable, the software vendors have less to worry about when it comes to certifying releases of their products. This is a feedback loop where the demand for software compounds the demand for locked down OS releases. Long story short, if you want one of the big DB players, you will end up on Red Hat or Solaris.
Enterprises have usually been around for a while and have collected some serious technical debt. You can argue whether or not this is a valid argument, but it exists and it is reality for many admins. Developers move on, projects are abandoned, but unless someone is keeping a very keen eye on their software inventory, that code is still running on a OS somewhere and its support details can be somewhat sketchy. Because of this technical debt, people can become very tied to specific server configurations and any attempt to "mess with" said configurations is met with fear and trembling. Keeping the binary compatibility guarantees of say a Red Hat or Solaris, means that these legacy apps can keep on running without intervention.
Enterprises do not like surprises. One of the requirements that a lot of enterprises have is that of regular patching. Were patching to introduce major version changes to parts of the system, then the anxiety associated with patching would be much higher. Imagine if Apache were to suddenly deprecate the use of single file httpd.conf configurations and force everyone over to httpd.d type configurations, this would break multitudes of applications and stop any progress of keeping systems up to date. In an ironic twist, locking down versions of system software actually helps to keep them current (at least in terms of security fixes) as there is higher confidence in the patching process.

There are probably many more reasons that enterprises need Long Term Supported OS'; However I am more interested in what admins can do to avoid getting locked into certain OS' or specific server configurations in the first place to allow easier moves to more up to date releases.

In summary

STOP DOING THINGS BY HAND and get a configuration management tool
Use said configuration management tool for defining your applications' requirements
Test your applications' portability to flush out hidden dependencies
Support all of the above with proper policies around life cycle management and configuration management.

1. STOP DOING THINGS BY HAND. Seriously, 1999 wants its administration techniques back. With tools like chef or puppet available for free or with support contracts, there is no excuse to be hand crafting configurations on servers anymore. That's all very nice to say, but what are the consequences of hand crafting a server?

There is no reasonable way to replicate the environment, this means that when you need to move from one OS version to another for support reasons, the process to do so is based on the administrator's ability to document (or remember if it is the same person) the steps required to configure the server and install the application.
Hand crafting a server limits your ability to create production-like development or test environments or staging environments for moving your applications to a new server.
People become very attached to their hand crafted servers because their hand crafted servers were built specifically to run their application. When this happens, they become complacent about maintaining the documentation and configuration of the servers because they will always be there right? Wrong! You will eventually need to upgrade the hardware or the OS soon enough and that "one-off hack adding a symlink to X" will comeback and bite you when you move over to the new machine.

Side note: I remember one particularly experienced application administrator telling me to always remember the three P's of things that can go wrong in a migration; Passwords, Profiles and Permissions. This has stayed with me and influenced much of my thinking around configuration management.

2. Explicitly define *all* of your specific dependencies. If your application specifically requires a particular version of a library or a user particular user defined or a particular directory permission set then this should be explicitly called out. This can be done in documentation, but we all know that documentation can suffer from bit rot just like software; It becomes outdated or downright inaccurate over the time. The best way to enforce these dependencies is using a configuration management system like chef or puppet because not only are these set at install time, they are also actively maintained as part of the run time configuration for the server so that if a permission is changed by hand, it will be changed back. Defining all of your configuration explicitly in a configuration management tool allows you to recreate the application environment (think the three Ps) on a new system without worrying that particular pieces are missing.

Side note: This does force people to work within the confines of the configuration management tool, but that is more of an organizational issue than a technical one.

3. Keep your applications portable. I made a previous attempt at discussing this, but basically if your application can be *easily* moved between servers, then the odds are high that you do not have underlying undocumented system dependencies and this will allow for an easy migration. If your application runs on a single server and it was deemed important enough to have a DR backup, then you should be moving the application between servers on a semi-regular basis to ensure portability. This could be done during official DR testing or more often if it is associated with events like regular patching/maintenance etc. If your application is run across multiple machines, then adding new machines to the cluster of machines will also test the configuration and ensure that there are no hidden system dependencies.

4. Don't let things become stale in the first place. If you allow configurations to drift or OS releases to age in your environment, the harder it will be to move off them. The people who originally installed the servers and the applications may no longer work for the company, finding support for the operating system may become much harder (Note: RHEL 4 and Solaris 8 both hit end of life recently) and the inevitable fear and trembling will set in. "You can't patch server X, no one knows how it works!". The only way to avoid this is to have strong policies in place around OS support (and subsequent upgrades) and configuration management. If people know that they will have to move their application in X years (defining X is an exercise for the reader) then they will be less complacent about sloppy configuration management practices, but for this to work properly they need to the right tools to record and maintain the configuration.

Am I saying that if you do all of those things, your enterprise will move from RHEL 5 to Fedora overnight? No, but it will make the move from RHEL 5 to RHEL 6 a lot easier than it otherwise would have been. If people have confidence in your ability to move from known configuration to known configuration, then maybe there would be a more relaxed attitude to say moving from RHEL to a distro that is slightly more up to date, but for that to happen you have to put in the hard yards first of collecting and maintaining all of that configuration out there.

There is a second option to relying on system defined requirements and that is to bundle all application requirements beyond basic system libraries into an application filesystem that is maintained by the developers. Example: If you application require zlib-X then bundle it with your application. This works very well for maintaining independence from the underlying server OS, but places a very high burden on the application support teams because they need to track and update versions of software as it is released for patches. It is much easier to allow the OS vendor to track and maintain this software, however the cost comes at the expense of application isolation from OS changes. I personally do not recommend this as developers should be spending their time developing rather than tracking and maintaining dependencies.

An adventure in agile administration

Sunday, April 8, 2012

Linux Long Term Support and why enterprises think they need it

No comments:

Post a Comment