Cover V11, I11

Article
Figure 1
Figure 2
Figure 3
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Table 1
Table 2
Table 3

nov2002.tar

Cross-Platform UNIX Software Packaging with OpenPKG

Ralf S. Engelschall, Thomas Lotterer, Michael Schloh von Bennewitz, and Christoph Schug

Many of us prefer open source software for its well-known advantages, but sometimes regret the associated disadvantages when manually applying it to a heterogeneous environment. To keep a work environment stable and secure, it's often necessary to search for the latest version of an application and collect the most recent patches. After that, systems administrators must build and install the new binaries on every UNIX box in the network. Then, after a laborious round of build manipulation, it might not be clear that the application will run as intended on each of the different platforms. If the application is a daemon, even more work awaits because most UNIX flavors have their own method of starting and stopping daemons.

In this article, we will explore OpenPKG, a software development and packaging project initiated by Cable & Wireless, an international Internet Service Provider. The OpenPKG project began in November 2000 and has grown into a collaborative software development effort managed and maintained by many. The project aims to create a modular and flexible UNIX subsystem for cross-platform software packaging and installation.

More specifically, the goals of OpenPKG stem from the historical problem often faced in the daily operation of an ISP. The major UNIX platforms in operation at ISPs include FreeBSD, Linux, and Solaris. OpenPKG, however, is not limited to the three major platforms mentioned (see Table 1). To achieve cross-platform portability, OpenPKG provides a subsystem on top of the underlying UNIX system as shown in Figure 1. It covers every essential server software component from shells, editors, and compilers, to network daemons and add-on applications. Hence, the intended target community consists of systems administrators faced with a large and diverse set of UNIX servers.

Internally, OpenPKG leverages the existing packaging technology of the Red Hat Package Manager (RPM). However, the RPM software included with OpenPKG is extended to be more unique and self-contained. The more than 400 available OpenPKG packages are really just RPM packages under the hood, but were developed in an OpenPKG standard approach. The packages are clean and robust, because they follow strict style guidelines and environment requirements.

To meet OpenPKG guidelines and standards, a package must be built from pristine vendor sources in a non-root temporary environment. It must work in an arbitrary file system location, follow a strict file system layout, and must be self-contained within its OpenPKG instance. Furthermore, the package must be independent from external UNIX facilities, install with a reasonable configuration, and use log file rotations and other such administrative wonders.

These package-building guidelines yield several benefits to OpenPKG users. OpenPKG users can install an instance (the OpenPKG subsystem and user-chosen packages) under any file system location, and even install multiple such instances on a single UNIX system. The main OpenPKG project environment is hosted on a machine with six other ongoing software projects, each with their own dedicated OpenPKG instance. To separately satisfy each project's needs, the associated OpenPKG instance serves each required software component from Postfix to BIND, and INN to Apache. Each project can therefore run in its own isolated environment, much like on a virtual machine.

The OpenPKG Package Lifecycle

OpenPKG follows an approach of minimum OS intrusion and maximum standalone presence. It tries hard to smooth out the differences between the underlying vendor solutions. We are often asked why OpenPKG uses RPM as the underlying packaging technology when other alternatives exist. There are indeed other packaging technologies available to projects like OpenPKG, such as the Debian dpkg/apt combination, FreeBSD ports, and System V pkgadd. However, RPM along with its OpenPKG extensions is the only solution that covers the whole package lifecycle in a fully consistent way.

The OpenPKG package lifecycle starts with fetching, unpacking, patching, and building the source package from pristine vendor sources. It builds the binary package in an unprivileged environment and finishes its life term with the installation, upgrade, and de-installation of the binary package on the target OpenPKG instance. This all works in a self-contained environment and is driven by complete package specifications (RPM .spec files). So, to finally answer the question, OpenPKG adopts RPM as its underlying packaging technology because no other fulfills these requirements.

Note that OpenPKG is primarily about packaging, not porting. One requirement of the OpenPKG packaging philosophy is that the vendor software be portable to begin with. Minor platform porting issues are fixed by the OpenPKG packagers, but fundamental changes are not considered. In fact, the main reason some platforms lack full OpenPKG support is that the amount of overhead in building software on them is not within reason.

OpenPKG also officially discourages the use of binary packages, and only provides them for bootstrapping (development tools not available) and emergency (tight time constraints) purposes. In our experience, installing binary packages built from source packages on the target machine outperforms other binary methods in respect to security and robustness.

There are simply too many subtle differences between most build and install systems that can influence the binary at run time and cause trouble. Some important run-time parameters, such as the maximum size of shared memory segments, are compiled into the binary on the build machine. Among many examples of such a run-time build dependency is a situation in which an Apache package is built with mod_ssl and MM. The dependency details of such a combination are overwhelming when sorting out the run-time parameters. To avoid such trouble, we believe the best solution is to always start with source packages.

Bootstrapping OpenPKG for the First Time

The OpenPKG bootstrapping process is wrapped into a shell script that, when run, will create a new instance of OpenPKG. This process is as self-contained as possible, and requires a minimum amount of operating system support and tools to unpack and compile itself. In the best case, the script will search the $PATH for the development tools tar, make, and cc and use them in its processing. If any of these tools are missing, an alternative approach exists in which a shell script containing binaries provides the missing tools.

The first step in bootstrapping involves dedicating a unique file system prefix to the instance along with user and group ids. The generic bootstrap building script called openpkg-version-release.src.sh requires these arguments and creates a platform-specific bootstrap installation script named openpkg-version-release.arch-os-id.sh. When run, this script installs the OpenPKG instance under the specified prefix with all files owned by the user and group (Figure 2). This bootstrapping process links the OpenPKG instance with the underlying UNIX system with only a few anchor points. Subsequent package installations do not touch the system at all, and if OpenPKG itself is un-installed, the anchor points vanish.

After creating a self-contained hierarchy, the bootstrap process registers itself as openpkg, and can thus be upgraded or treated like any other package. To make upgrading an already bootstrapped OpenPKG instance easier, .rpm versions of the bootstrap package are also available. A step-by-step example of a complete installation and de-installation of an OpenPKG instance with a RSYNC server package is given in Listing 1. To understand the RPM commands used, see the quick reference in Table 2.

OpenPKG File System Layout

Every file system standard sucks. OpenPKG's file system aims to suck less (Figure 2). Basically, its package area resembles the traditional layout found under /usr on most popular UNIX systems. Additionally, it contains its own RPM package management information in a sub-area for purposes of self-containment and a local area for adding unpackaged components.

OpenPKG breaks with tradition in one aspect of its file system layout. It unconventionally uses a separate subdirectory of prefix/etc/, prefix/share/ and prefix/var/ for each installed package. These subdirectories are easy to manage, because each is named after its associated package. This provides for a better structure than the usual mess of files, and every OpenPKG package adheres to this layout scheme (even when requiring a lot of effort to override the different vendor package intentions).

Looking again at the RSYNC example in Listing 1, note that the RSYNC configuration is in prefix/etc/rsync/, and it logs to somewhere in prefix/var/rsync/. Such ease of maintenance makes backups easier, moving whole instances without hassle, and more.

Managing OpenPKG Packages

When building packages, the temporary files are placed into subdirectories of prefix/RPM/ by default. A package builder can obtain the necessary subdirectory access by either being a member of the associated OpenPKG group, logging in under the user id of the OpenPKG instance, or logging in as root. A carefully written ~/.rpmmacros file can alternatively redirect the paths to a specified location (see the default macros %_sourcedir, %_specdir, %_builddir, %_tmppath, %_rpmdir, %_srcrpmdir in prefix/etc/openpkg/rpmmacros) and allow even an arbitrary user to build packages.

To build a binary package pkg-bin from a source package pkg-src, use rpm --rebuild pkg-src. OpenPKG's RPM will read the .spec information of the pkg-src, build the package based on the information, and place the resulting binary package in prefix/RPM/PKG/pkg-bin. To finally install the binary package so that it becomes part of the OpenPKG instance, use rpm -Uvh pkg-bin. Strictly speaking, this upgrades the package. To RPM, installation is nothing more than the special case of upgrading from nothing.

As a side note, some packages provide alternative build variants through boolean variables named with_name. To determine which variables are available (if any), run "rpm -qpi pkg-src | grep with_". To build a binary package using such variables, add --define "with_name value" to the rpm --rebuild command to override the default value.

RPM is very clever when it comes to keeping configuration files during an upgrade, as shown in Table 3. An old configuration file is kept if the systems administrator stuck to default configuration, or if the configuration was changed but coincidentally matches the default configuration of the new package. In practice, an administrator-changed configuration must be reapplied in few cases of package upgrade. In any case, if a configuration file is not kept, RPM will save the old configuration file with the extension .rpmsave before saving a new default in its place. This ensures that changes to a default configuration can be recovered and reapplied so that an upgraded package will run correctly. If a new default configuration file replaces an old one that retains its original (but old) RPM default, RPM will rename it with the extension .rpmorig.

To make this delightful mechanism work properly, the configuration files of each package must be explicitly tagged. OpenPKG packages all follow this principle, further contributing to OpenPKG's robust nature. OpenPKG's RPM does the intuitive right thing by making sure that a changed configuration file is kept in place if possible and, if not, preserves it for manual consideration and application.

Finally, after the installation of a package, you can query a lot of its information. The command rpm -qi pkg-name summarizes a single installed package, while rpm -qa lists the names of all installed packages. rpm -qlv pkg-name lists all the files associated with a package, and rpm -qf prefix/path/to/file states to which package the given file belongs. You can even check a package's integrity using rpm -V pkg-name to verify which files have been tampered with or somehow munged. For more details on this, see Table 2.

The OpenPKG Run-Command Facility

You might have noticed that in the previous example installation of RSYNC, the server was started using the command /usr/opkg/etc/rc rsync start. The workhorse behind this simple statement is the powerful OpenPKG run-command facility, executed with prefix/etc/rc. Run-commands for every package are conveniently named prefix/etc/rc.d/rc.pkg-name. Each offers the functionality of several shell script segments encapsulated in a single file. The sections of a run-command file are identified by left-aligned labels prefixed with '%'. Listing 2 shows rc.rsync as an example.

The rc command takes pkg-name as the first argument and one or more section labels as additional arguments. The run command segments corresponding with the desired section labels are then extracted from the rc.pkg-name file and executed in the order given on the command line. The reserved package name all serves as a wildcard and refers to all installed OpenPKG packages, causing the processing of all run-command files in a specified order. In this case, the run-command facility will order the run-command processing according to the priority field (-p number) of the given section label in each run-command file. Another popular field in a section label is -u user, which directs the script code to execute with the privileges of user.

Most sections in a run-command file have arbitrary labels intended for use as command-line arguments to the run-command facility. However, some sections have special meaning. The section labels of these are reserved names used internally by the run-command facility. For example, the %common section functions as a library and contains script code useful to some or all of the other sections. Its script code is run before any other script code.

Just like its cousin, the %common section, the %config section can appear only one time in each run-command file. It contains variables used to configure the behavior of the other sections residing in the same run-command file. This means that logging and enabling variables in a %config section will only affect the associated package, for example. Such variables can be overridden in prefix/etc/rc.conf in a per-hierarchy scope, however. Technically, the run-command facility assembles a large script file from the %config section, the prefix/etc/rc.conf file, the %common section, and finally the user-defined section given as an argument (in that order). The fat script is then executed.

The sections %monthly, %weekly, %daily, %hourly, and %quarterly also have special meaning, as the OpenPKG bootstrap process sets up cron jobs to execute them accordingly. Another label often seen is %env, which is intended to be used with the --eval option explained later.

Regarding configuration through variables, note that the rc.pkg-name file is intentionally not tagged as a configuration file and will be overwritten on updates with no questions asked. The prefix/etc/rc.conf file is tagged as a configuration file and is intended for overriding variables.

With OpenPKG, all daemon packages are released with scripts that recognize the value of a variable pkg-name_enable (default value "yes"). Setting this variable to "no" disables all run commands of the daemon in question. As seen with the RSYNC server example, this can be quite useful when installing a package just to get a client piece. If the server piece is not of interest, then a simple variable shuts it off completely. Similarly, to disable the automatic startup of all daemons in a hierarchy, just add a openpkg_runall="no" to prefix/etc/rc.conf. In this case, daemons can still be started manually. This feature may be useful to admins wanting control over daemons with finer granularity.

The OpenPKG run-command facility has many other interesting features. Use rc --query variable to see the effective value of any configured variable, or use rc --config to see a complete list of all available variables with their default and effective values. The run-command facility also offers a handy feature to allow packages to extend the user shell environment. For example, the bootstrap package openpkg uses this to add the OpenPKG instance into your PATH, MANPATH, INFOPATH, etc. Just execute eval 'prefix/etc/rc --eval openpkg env' to perform this environment extension for your current shell session.

OpenPKG RPM vs. Red Hat RPM

As mentioned, OpenPKG is based on a uniquely adjusted and extended RPM-based packaging facility that allows for very concise and clean package specifications and building of every package in an unprivileged environment. To understand the added value of the OpenPKG implementation, let's take as example the OpenPKG packaging of the RSYNC program. The OpenPKG packaging consists of three files: the RPM specification (rsync.spec, Listing 3), the run-commands (rc.rsync, Listing 2), and the default daemon configuration (rsync.conf, Listing 4). Compared with the RPM-based RSYNC package of other vendors, the OpenPKG RPM-based package is full featured yet very concise and clean. This is because of OpenPKG's RPM extensions and strict style guidelines.

To offer more portable and concise shell scripting, OpenPKG's RPM implementation uses GNU shtool. All manual installation and patching tasks are done with the shtool command. A companion tool, rpmtool, complements shtool with RPM and OS-specific features. The rpmtool allows all OpenPKG packages to generate their file list (%files) on the fly and makes the packaging information smaller. It reduces the required maintenance when vendor version updates occur as well.

OpenPKG's RPM additionally provides a set of local macros (%{l_xxx}) to abstract system specifics and help in remove redundancy from packaging specifications. For example, the %{l_prefix} is the file system prefix of the associated OpenPKG instance. Using OpenPKG's local macros offers a clear advantage, because packages do not need hard-coded path prefixes and can be built for arbitrary OpenPKG instances.

Macros exist for the most often used build variables. The %{l_cc} macro expands to either prefix/bin/cc (in case the OpenPKG gcc package is installed) or defaults to cc. The same goes for %{l_cflags -O}: it expands to the optimized C compiler flags. If gcc is installed, it expands to "-O2 -pipe". Otherwise, it expands to just -O by default. The variables %{l_make} and %{l_mflags} work together in a similar way. If %{l_make} points to a known make that supports parallel building, and the underlying system has more than one CPU, then %{l_mflags -O} expands to the necessary flags to leverage the system's multiple processing power. For example, on a 2-CPU FreeBSD 4.x machine with BSD make, %{l_mflags -O} expands to -j4 -B, while on a 4-CPU Linux machine with GNU make, %{l_mflags -O} expands to --no-print-directory -j8.

Additionally, all OpenPKG packages follow exactly the same style as the RSYNC example (see Listings 2, 3, and 4). The header order, indentation, etc. are standardized, and allow developers to easily query and even semi-automatically edit package information directly from the source. Incidentally, the indices on the OpenPKG FTP server and the OpenPKG release engineering procedures are auto-generated by exploiting this standard scheme.

Every OpenPKG package is able to build in an unprivileged (non-root user) environment and with read-only access to an OpenPKG instance. This allows safe (no development system intrusion) and precise (no trashed or missing files) packaging. Such security and precision is achieved by consistently using the BuildRoot feature of RPM for all packages. In short, this means that when rolling a binary package, the software is redirected to install into a shadow area (prefix/RPM/TMP/pkg-name-root/prefix). The package is then made from the shadow area just as if it were located in the real file system location (prefix). This improvement to the standard RPM behavior may sound trivial and easy to achieve, but is actually one of the trickiest steps in packaging software for OpenPKG. Sometimes (as with RSYNC), it is just a matter of overriding variables (prefix in the example) on the make install step. Other times, the solution is more involved. For some OpenPKG packages, it takes a lot of effort to find a reasonable way to redirect the vendor installation to the BuildRoot location, but the extra effort is always worthwhile and results in safer and more precise packaging.

Finally, OpenPKG's RPM implementation provides proxy packages, an appealing mechanism for reusing the packages of a master OpenPKG instance. Proxy packages can reside in multiple slave OpenPKG instances, and allow the systems administrator to avoid redundant building and maintaining of the same software package in multiple OpenPKG instances. For example, gcc is typically required by many packages at build time. A gcc OpenPKG package is usually needed in every OpenPKG instance. A savvy systems administrator will install a single gcc package in a master OpenPKG instance and then install only proxy packages (pointing to the real gcc) in the other OpenPKG instances by running slave-prefix/bin/rpm --makeproxy on the gcc binary RPM of the master instance. OpenPKG's RPM will then produce a binary RPM package for the slave instance containing a shadow tree resembling the contents of the master instance. The shadow tree is technically nothing more than symbolic links to the master (non-proxy) package's files and directories. This mechanism can save a lot of time and storage, however it should be applied to packages with global configuration dependencies only or with no configuration dependencies at all.

Integrating Unpackaged Software

No matter how many packages OpenPKG provides, the world will always have other appealing yet unpackaged software. Ambitious systems administrators can package the software themselves for local purposes and even contribute new packages to the OpenPKG community. Alternatively, the local subdirectory of an OpenPKG instance exists for the purpose of containing unpackaged software, and can be instrumental in integrating a base of OpenPKG packages with other unpackaged software in an easy to maintain way. OpenPKG also provides a corresponding lsync tool to aid such integration.

To integrate unpackaged software into an OpenPKG instance, each unpackaged software component can be installed into the bin, sbin, man, info, include, and lib subdirectories of prefix/local/PKG/pkg-name/ and then virtually linked into the corresponding top-level directories under prefix/local/ by running prefix/sbin/lsync. This strategy leads to a very clean and maintainable OpenPKG instance, even with its new coexisting unpackaged software in prefix/local/. This especially makes it easy to un-install a package. Just remove prefix/local/PKG/pkg-name/ with all its contents and run lsync again.

This strategy even allows for installation of different versions of the same software. Just install into prefix/local/PKG/pkg-name-version/ and add a symbolic link pointing from prefix/local/PKG/pkg-name to this directory. This works because lsync skips subdirectories of prefix/local/PKG/ with version numbers attached. To upgrade an older foo-0.7.41 to foo-0.7.42, just repeat the installation in the same way, altering the symlink prefix/local/PKG/foo to point to foo-0.7.42 instead and running lsync again. lsync will automatically update symlinks, creating new links if required and removing outdated dangling ones (see Listing 5). As might be guessed, it is just as easy to go back to the old version if the new one keeps dumping core or something. For an example of such multiple unpackaged software installation, see Figure 3.

OpenPKG Release Engineering

A carefully crafted release process is part of the OpenPKG project, and the fruits of the whole project are available to the public according to open source standards. All sources (package specifications, source patches, Web site sources, the handbook, this article, etc.) are located in a publicly readable central CVS repository, which can be browsed anonymously by conventional cvs commands or through the Web site for added convenience. Additionally, all developer commits to this repository are tracked and summarized with postings to public mailing lists and public newsgroups. Participants can easily follow all developments by subscribing to the list or reading the newsgroup.

For stability and to reduce conflicts between development milestones, OpenPKG has three release branches (which technically directly map to CVS branches). These are OpenPKG-SOLID, OpenPKG-STABLE, and OpenPKG-CURRENT. OpenPKG-SOLID is the security update branch of the last public OpenPKG release. OpenPKG-STABLE is the stable branch from whose contents the next public release is made. OpenPKG-CURRENT is the current state of the development branch and contains packages of beta-grade stability. In any case, the branch from which a package was built can easily be determined by the OpenPKG RPM file name, because they follow a consistent naming scheme: pkg-name-version-YYYYMMDD (for CURRENT), pkg-name-version-N.YYYYMMDD (for N-STABLE), pkg-name-version-N.M.X (for N.M.X-SOLID). Once such a source RPM file is built, the new binary RPM file name contains additional information, such as operating system, hardware, and the OpenPKG instance prefix.

The OpenPKG developer team is very fast in keeping OpenPKG-CURRENT packages up to date and in sync with the latest vendor versions. This is possible because the versions of all externally available vendor sources are automatically tracked on a daily basis. An OpenPKG package for a new vendor software version is often available before the software is even announced on Freshmeat.net.

Finally, OpenPKG takes security very seriously. Experience has shown that "security through obscurity" does not work, and that public disclosure leads to quicker and better solutions to security problems. Thus, OpenPKG tries to release fixed packages as quickly as possible when a vulnerability is discovered. The OpenPKG security release and advisory process publishes official security advisories in the security section of the Web site and on the mailing lists.

Conclusion

OpenPKG is an open source software project founded by Cable & Wireless Germany in November 2000. The implementation relies on RPM 4 for its basic packaging mechanism, but offers more than RPM alone. To meet its goal of becoming a modular and flexible UNIX subsystem for cross-platform software packaging and installation, OpenPKG includes tricky bootstrapping logic that installs a customized implementation of RPM 4 on any of the supported target platforms.

OpenPKG has been in production use at Cable & Wireless Germany since April 2001. Since its public release in January 2002, OpenPKG users have profited from an increase of 220 to more than 400 software packages. The project is continuously improved by a team of developers who also daily update and add packages. The base of OpenPKG software packages is expected to increase even more, partly because of the ease of writing specifications and building packages. Most OpenPKG users find it deceptively simple to build a basic package. New users interested in such packaging can use the RSYNC example in this article as a blueprint. Accordingly, package contributions are always appreciated by the members of the OpenPKG project.

To make OpenPKG even more attractive, work is under way on a front end, which will simplify and control the installation process according to build and install dependencies. OpenPKG is also fulfilling plans to satisfy the desktop user by offering X11-dependent packages for Gtk, Qt, Gimp, Mozilla, and many others. For faster execution and even more flexibility, a further enhanced run-command processor is also under development. Shared library support is under investigation, too. Lastly, we are looking forward to upgrading OpenPKG to use the forthcoming RPM 4.1 version.

References

OpenPKG: http://www.openpkg.org/ ftp://ftp.openpkg.org/

OpenPKG Community Forums:
mailto:openpkg-users@openpkg.org
mailto:openpkg-dev@openpkg.org
nntp://news.openpkg.org/openpkg.users
nntp://news.openpkg.org/openpkg.dev

RPM http://www.rpm.org/ ftp://ftp.rpm.org/pub/rpm/

Ralf S. Engelschall is a computer scientist and Open Source software hacker, leading the software development department at Cable & Wireless Germany. He is the author of well-known software like Apache mod_ssl, GNU Pth, and GNU Shtool and the founder of Open Source software projects like OpenSSL, OpenPKG, and OSSP. He can be contacted at: rse@engelschall.com.

Thomas Lotterer is a network professional and consultant working as a UNIX software developer at Cable & Wireless Germany. He gained experience in cross-platform system integration and software distribution by working previously as a systems administrator and technical trainer. Today, Thomas works actively on the OpenPKG and OSSP projects. He can be contacted at: thomas@lotterer.net.

Michael Schloh von Bennewitz is a software engineer at Cable & Wireless Germany, where he works on the network and user interface logic of ISP tools and technologies. He is an active contributor to both the OpenPKG and OSSP projects. With fingers blazing, Michael listened to classical music while writing parts of this article in order to go even faster. He can be contacted at: michael@schloh.com.

Christoph Schug is a senior UNIX systems administrator at Cable & Wireless Germany. He leads the hosting department and is responsible for all managed servers at the Munich data center. His revolutionary ideas and visions often result in additional lines in Ralf's TODO list. When not in the office, Christoph might be found in the Alps steering the screaming and smoking tires of his Miata MX-5 roadster. He can be contacted at: chris@schug.net.