Cross-Platform
UNIX Software Packaging with OpenPKG
Ralf S. Engelschall, Thomas Lotterer, Michael Schloh von Bennewitz,
and Christoph Schug
Many of us prefer open source software for its well-known advantages,
but sometimes regret the associated disadvantages when manually
applying it to a heterogeneous environment. To keep a work environment
stable and secure, it's often necessary to search for the latest
version of an application and collect the most recent patches. After
that, systems administrators must build and install the new binaries
on every UNIX box in the network. Then, after a laborious round
of build manipulation, it might not be clear that the application
will run as intended on each of the different platforms. If the
application is a daemon, even more work awaits because most UNIX
flavors have their own method of starting and stopping daemons.
In this article, we will explore OpenPKG, a software development
and packaging project initiated by Cable & Wireless, an international
Internet Service Provider. The OpenPKG project began in November
2000 and has grown into a collaborative software development effort
managed and maintained by many. The project aims to create a modular
and flexible UNIX subsystem for cross-platform software packaging
and installation.
More specifically, the goals of OpenPKG stem from the historical
problem often faced in the daily operation of an ISP. The major
UNIX platforms in operation at ISPs include FreeBSD, Linux, and
Solaris. OpenPKG, however, is not limited to the three major platforms
mentioned (see Table 1). To achieve cross-platform portability,
OpenPKG provides a subsystem on top of the underlying UNIX system
as shown in Figure 1. It covers every essential server software
component from shells, editors, and compilers, to network daemons
and add-on applications. Hence, the intended target community consists
of systems administrators faced with a large and diverse set of
UNIX servers.
Internally, OpenPKG leverages the existing packaging technology
of the Red Hat Package Manager (RPM). However, the RPM software
included with OpenPKG is extended to be more unique and self-contained.
The more than 400 available OpenPKG packages are really just RPM
packages under the hood, but were developed in an OpenPKG standard
approach. The packages are clean and robust, because they follow
strict style guidelines and environment requirements.
To meet OpenPKG guidelines and standards, a package must be built
from pristine vendor sources in a non-root temporary environment.
It must work in an arbitrary file system location, follow a strict
file system layout, and must be self-contained within its OpenPKG
instance. Furthermore, the package must be independent from external
UNIX facilities, install with a reasonable configuration, and use
log file rotations and other such administrative wonders.
These package-building guidelines yield several benefits to OpenPKG
users. OpenPKG users can install an instance (the OpenPKG subsystem
and user-chosen packages) under any file system location, and even
install multiple such instances on a single UNIX system. The main
OpenPKG project environment is hosted on a machine with six other
ongoing software projects, each with their own dedicated OpenPKG
instance. To separately satisfy each project's needs, the associated
OpenPKG instance serves each required software component from Postfix
to BIND, and INN to Apache. Each project can therefore run in its
own isolated environment, much like on a virtual machine.
The OpenPKG Package Lifecycle
OpenPKG follows an approach of minimum OS intrusion and maximum
standalone presence. It tries hard to smooth out the differences
between the underlying vendor solutions. We are often asked why
OpenPKG uses RPM as the underlying packaging technology when other
alternatives exist. There are indeed other packaging technologies
available to projects like OpenPKG, such as the Debian dpkg/apt
combination, FreeBSD ports, and System V pkgadd. However,
RPM along with its OpenPKG extensions is the only solution that
covers the whole package lifecycle in a fully consistent
way.
The OpenPKG package lifecycle starts with fetching, unpacking,
patching, and building the source package from pristine vendor sources.
It builds the binary package in an unprivileged environment and
finishes its life term with the installation, upgrade, and de-installation
of the binary package on the target OpenPKG instance. This all works
in a self-contained environment and is driven by complete package
specifications (RPM .spec files). So, to finally answer the
question, OpenPKG adopts RPM as its underlying packaging technology
because no other fulfills these requirements.
Note that OpenPKG is primarily about packaging, not porting. One
requirement of the OpenPKG packaging philosophy is that the vendor
software be portable to begin with. Minor platform porting issues
are fixed by the OpenPKG packagers, but fundamental changes are
not considered. In fact, the main reason some platforms lack full
OpenPKG support is that the amount of overhead in building software
on them is not within reason.
OpenPKG also officially discourages the use of binary packages,
and only provides them for bootstrapping (development tools not
available) and emergency (tight time constraints) purposes. In our
experience, installing binary packages built from source packages
on the target machine outperforms other binary methods in respect
to security and robustness.
There are simply too many subtle differences between most build
and install systems that can influence the binary at run time and
cause trouble. Some important run-time parameters, such as the maximum
size of shared memory segments, are compiled into the binary on
the build machine. Among many examples of such a run-time build
dependency is a situation in which an Apache package is built with
mod_ssl and MM. The dependency details of such a combination are
overwhelming when sorting out the run-time parameters. To avoid
such trouble, we believe the best solution is to always start with
source packages.
Bootstrapping OpenPKG for the First Time
The OpenPKG bootstrapping process is wrapped into a shell script
that, when run, will create a new instance of OpenPKG. This process
is as self-contained as possible, and requires a minimum amount
of operating system support and tools to unpack and compile itself.
In the best case, the script will search the $PATH for the
development tools tar, make, and cc and use
them in its processing. If any of these tools are missing, an alternative
approach exists in which a shell script containing binaries provides
the missing tools.
The first step in bootstrapping involves dedicating a unique file
system prefix to the instance along with user and group ids. The
generic bootstrap building script called openpkg-version-release.src.sh
requires these arguments and creates a platform-specific bootstrap
installation script named openpkg-version-release.arch-os-id.sh.
When run, this script installs the OpenPKG instance under the specified
prefix with all files owned by the user and group (Figure 2). This
bootstrapping process links the OpenPKG instance with the underlying
UNIX system with only a few anchor points. Subsequent package installations
do not touch the system at all, and if OpenPKG itself is un-installed,
the anchor points vanish.
After creating a self-contained hierarchy, the bootstrap process
registers itself as openpkg, and can thus be upgraded or
treated like any other package. To make upgrading an already bootstrapped
OpenPKG instance easier, .rpm versions of the bootstrap package
are also available. A step-by-step example of a complete installation
and de-installation of an OpenPKG instance with a RSYNC server package
is given in Listing 1. To understand the RPM commands used, see
the quick reference in Table 2.
OpenPKG File System Layout
Every file system standard sucks. OpenPKG's file system aims
to suck less (Figure 2). Basically, its package area resembles the
traditional layout found under /usr on most popular UNIX
systems. Additionally, it contains its own RPM package management
information in a sub-area for purposes of self-containment and a
local area for adding unpackaged components.
OpenPKG breaks with tradition in one aspect of its file system
layout. It unconventionally uses a separate subdirectory of prefix/etc/,
prefix/share/ and prefix/var/ for each
installed package. These subdirectories are easy to manage, because
each is named after its associated package. This provides for a
better structure than the usual mess of files, and every OpenPKG
package adheres to this layout scheme (even when requiring a lot
of effort to override the different vendor package intentions).
Looking again at the RSYNC example in Listing 1, note that the
RSYNC configuration is in prefix/etc/rsync/, and it
logs to somewhere in prefix/var/rsync/. Such ease
of maintenance makes backups easier, moving whole instances without
hassle, and more.
Managing OpenPKG Packages
When building packages, the temporary files are placed into subdirectories
of prefix/RPM/ by default. A package builder can obtain the
necessary subdirectory access by either being a member of the associated
OpenPKG group, logging in under the user id of the OpenPKG instance,
or logging in as root. A carefully written ~/.rpmmacros file
can alternatively redirect the paths to a specified location (see
the default macros %_sourcedir, %_specdir, %_builddir,
%_tmppath, %_rpmdir, %_srcrpmdir in prefix/etc/openpkg/rpmmacros)
and allow even an arbitrary user to build packages.
To build a binary package pkg-bin from a source package
pkg-src, use rpm --rebuild pkg-src. OpenPKG's
RPM will read the .spec information of the pkg-src,
build the package based on the information, and place the resulting
binary package in prefix/RPM/PKG/pkg-bin. To
finally install the binary package so that it becomes part of the
OpenPKG instance, use rpm -Uvh pkg-bin. Strictly speaking,
this upgrades the package. To RPM, installation is nothing more
than the special case of upgrading from nothing.
As a side note, some packages provide alternative build variants
through boolean variables named with_name. To determine
which variables are available (if any), run "rpm -qpi
pkg-src | grep with_". To build a binary package
using such variables, add --define "with_name value"
to the rpm --rebuild command to override the default value.
RPM is very clever when it comes to keeping configuration files
during an upgrade, as shown in Table 3. An old configuration file
is kept if the systems administrator stuck to default configuration,
or if the configuration was changed but coincidentally matches the
default configuration of the new package. In practice, an administrator-changed
configuration must be reapplied in few cases of package upgrade.
In any case, if a configuration file is not kept, RPM will save
the old configuration file with the extension .rpmsave before
saving a new default in its place. This ensures that changes to
a default configuration can be recovered and reapplied so that an
upgraded package will run correctly. If a new default configuration
file replaces an old one that retains its original (but old) RPM
default, RPM will rename it with the extension .rpmorig.
To make this delightful mechanism work properly, the configuration
files of each package must be explicitly tagged. OpenPKG packages
all follow this principle, further contributing to OpenPKG's
robust nature. OpenPKG's RPM does the intuitive right thing
by making sure that a changed configuration file is kept in place
if possible and, if not, preserves it for manual consideration and
application.
Finally, after the installation of a package, you can query a
lot of its information. The command rpm -qi pkg-name
summarizes a single installed package, while rpm -qa lists
the names of all installed packages. rpm -qlv pkg-name
lists all the files associated with a package, and rpm -qf prefix/path/to/file
states to which package the given file belongs. You can even check
a package's integrity using rpm -V pkg-name to
verify which files have been tampered with or somehow munged. For
more details on this, see Table 2.
The OpenPKG Run-Command Facility
You might have noticed that in the previous example installation
of RSYNC, the server was started using the command /usr/opkg/etc/rc
rsync start. The workhorse behind this simple statement is the
powerful OpenPKG run-command facility, executed with prefix/etc/rc.
Run-commands for every package are conveniently named prefix/etc/rc.d/rc.pkg-name.
Each offers the functionality of several shell script segments encapsulated
in a single file. The sections of a run-command file are identified
by left-aligned labels prefixed with '%'. Listing
2 shows rc.rsync as an example.
The rc command takes pkg-name as the first argument
and one or more section labels as additional arguments. The run
command segments corresponding with the desired section labels are
then extracted from the rc.pkg-name file and executed
in the order given on the command line. The reserved package name
all serves as a wildcard and refers to all installed OpenPKG
packages, causing the processing of all run-command files in a specified
order. In this case, the run-command facility will order the run-command
processing according to the priority field (-p number)
of the given section label in each run-command file. Another popular
field in a section label is -u user, which directs
the script code to execute with the privileges of user.
Most sections in a run-command file have arbitrary labels intended
for use as command-line arguments to the run-command facility. However,
some sections have special meaning. The section labels of these
are reserved names used internally by the run-command facility.
For example, the %common section functions as a library and
contains script code useful to some or all of the other sections.
Its script code is run before any other script code.
Just like its cousin, the %common section, the %config
section can appear only one time in each run-command file. It contains
variables used to configure the behavior of the other sections residing
in the same run-command file. This means that logging and enabling
variables in a %config section will only affect the associated
package, for example. Such variables can be overridden in prefix/etc/rc.conf
in a per-hierarchy scope, however. Technically, the run-command
facility assembles a large script file from the %config section,
the prefix/etc/rc.conf file, the %common section,
and finally the user-defined section given as an argument (in that
order). The fat script is then executed.
The sections %monthly, %weekly, %daily, %hourly,
and %quarterly also have special meaning, as the OpenPKG
bootstrap process sets up cron jobs to execute them accordingly.
Another label often seen is %env, which is intended to be
used with the --eval option explained later.
Regarding configuration through variables, note that the rc.pkg-name
file is intentionally not tagged as a configuration file and will
be overwritten on updates with no questions asked. The prefix/etc/rc.conf
file is tagged as a configuration file and is intended for overriding
variables.
With OpenPKG, all daemon packages are released with scripts that
recognize the value of a variable pkg-name_enable
(default value "yes"). Setting this variable to
"no" disables all run commands of the daemon in
question. As seen with the RSYNC server example, this can be quite
useful when installing a package just to get a client piece. If
the server piece is not of interest, then a simple variable shuts
it off completely. Similarly, to disable the automatic startup of
all daemons in a hierarchy, just add a openpkg_runall="no"
to prefix/etc/rc.conf. In this case, daemons can still
be started manually. This feature may be useful to admins wanting
control over daemons with finer granularity.
The OpenPKG run-command facility has many other interesting features.
Use rc --query variable to see the effective value
of any configured variable, or use rc --config to see a complete
list of all available variables with their default and effective
values. The run-command facility also offers a handy feature to
allow packages to extend the user shell environment. For example,
the bootstrap package openpkg uses this to add the OpenPKG
instance into your PATH, MANPATH, INFOPATH,
etc. Just execute eval 'prefix/etc/rc --eval openpkg
env' to perform this environment extension for your current
shell session.
OpenPKG RPM vs. Red Hat RPM
As mentioned, OpenPKG is based on a uniquely adjusted and extended
RPM-based packaging facility that allows for very concise and clean
package specifications and building of every package in an
unprivileged environment. To understand the added value of the OpenPKG
implementation, let's take as example the OpenPKG packaging
of the RSYNC program. The OpenPKG packaging consists of three files:
the RPM specification (rsync.spec, Listing 3), the run-commands
(rc.rsync, Listing 2), and the default daemon configuration
(rsync.conf, Listing 4). Compared with the RPM-based RSYNC
package of other vendors, the OpenPKG RPM-based package is full
featured yet very concise and clean. This is because of OpenPKG's
RPM extensions and strict style guidelines.
To offer more portable and concise shell scripting, OpenPKG's
RPM implementation uses GNU shtool. All manual installation and
patching tasks are done with the shtool command. A companion
tool, rpmtool, complements shtool with RPM and OS-specific
features. The rpmtool allows all OpenPKG packages to generate
their file list (%files) on the fly and makes the packaging
information smaller. It reduces the required maintenance when vendor
version updates occur as well.
OpenPKG's RPM additionally provides a set of local macros
(%{l_xxx}) to abstract system specifics and help in remove
redundancy from packaging specifications. For example, the %{l_prefix}
is the file system prefix of the associated OpenPKG instance.
Using OpenPKG's local macros offers a clear advantage, because
packages do not need hard-coded path prefixes and can be built for
arbitrary OpenPKG instances.
Macros exist for the most often used build variables. The %{l_cc}
macro expands to either prefix/bin/cc (in case the
OpenPKG gcc package is installed) or defaults to cc.
The same goes for %{l_cflags -O}: it expands to the optimized
C compiler flags. If gcc is installed, it expands to "-O2
-pipe". Otherwise, it expands to just -O by default.
The variables %{l_make} and %{l_mflags} work together
in a similar way. If %{l_make} points to a known make
that supports parallel building, and the underlying system has more
than one CPU, then %{l_mflags -O} expands to the necessary
flags to leverage the system's multiple processing power. For
example, on a 2-CPU FreeBSD 4.x machine with BSD make, %{l_mflags
-O} expands to -j4 -B, while on a 4-CPU Linux machine
with GNU make, %{l_mflags -O} expands to --no-print-directory
-j8.
Additionally, all OpenPKG packages follow exactly the same
style as the RSYNC example (see Listings 2, 3, and 4). The header
order, indentation, etc. are standardized, and allow developers
to easily query and even semi-automatically edit package information
directly from the source. Incidentally, the indices on the OpenPKG
FTP server and the OpenPKG release engineering procedures are auto-generated
by exploiting this standard scheme.
Every OpenPKG package is able to build in an unprivileged (non-root
user) environment and with read-only access to an OpenPKG instance.
This allows safe (no development system intrusion) and precise (no
trashed or missing files) packaging. Such security and precision
is achieved by consistently using the BuildRoot feature of
RPM for all packages. In short, this means that when rolling
a binary package, the software is redirected to install into a shadow
area (prefix/RPM/TMP/pkg-name-root/prefix).
The package is then made from the shadow area just as if it were
located in the real file system location (prefix). This improvement
to the standard RPM behavior may sound trivial and easy to achieve,
but is actually one of the trickiest steps in packaging software
for OpenPKG. Sometimes (as with RSYNC), it is just a matter of overriding
variables (prefix in the example) on the make install
step. Other times, the solution is more involved. For some OpenPKG
packages, it takes a lot of effort to find a reasonable way to redirect
the vendor installation to the BuildRoot location, but the
extra effort is always worthwhile and results in safer and more
precise packaging.
Finally, OpenPKG's RPM implementation provides proxy packages,
an appealing mechanism for reusing the packages of a master OpenPKG
instance. Proxy packages can reside in multiple slave OpenPKG instances,
and allow the systems administrator to avoid redundant building
and maintaining of the same software package in multiple OpenPKG
instances. For example, gcc is typically required by many
packages at build time. A gcc OpenPKG package is usually
needed in every OpenPKG instance. A savvy systems administrator
will install a single gcc package in a master OpenPKG instance
and then install only proxy packages (pointing to the real gcc)
in the other OpenPKG instances by running slave-prefix/bin/rpm
--makeproxy on the gcc binary RPM of the master instance.
OpenPKG's RPM will then produce a binary RPM package for the
slave instance containing a shadow tree resembling the contents
of the master instance. The shadow tree is technically nothing more
than symbolic links to the master (non-proxy) package's files
and directories. This mechanism can save a lot of time and storage,
however it should be applied to packages with global configuration
dependencies only or with no configuration dependencies at all.
Integrating Unpackaged Software
No matter how many packages OpenPKG provides, the world will always
have other appealing yet unpackaged software. Ambitious systems
administrators can package the software themselves for local purposes
and even contribute new packages to the OpenPKG community. Alternatively,
the local subdirectory of an OpenPKG instance exists for
the purpose of containing unpackaged software, and can be instrumental
in integrating a base of OpenPKG packages with other unpackaged
software in an easy to maintain way. OpenPKG also provides a corresponding
lsync tool to aid such integration.
To integrate unpackaged software into an OpenPKG instance, each
unpackaged software component can be installed into the bin,
sbin, man, info, include, and lib
subdirectories of prefix/local/PKG/pkg-name/
and then virtually linked into the corresponding top-level directories
under prefix/local/ by running prefix/sbin/lsync.
This strategy leads to a very clean and maintainable OpenPKG instance,
even with its new coexisting unpackaged software in prefix/local/.
This especially makes it easy to un-install a package. Just remove
prefix/local/PKG/pkg-name/ with all
its contents and run lsync again.
This strategy even allows for installation of different versions
of the same software. Just install into prefix/local/PKG/pkg-name-version/
and add a symbolic link pointing from prefix/local/PKG/pkg-name
to this directory. This works because lsync skips subdirectories
of prefix/local/PKG/ with version numbers attached.
To upgrade an older foo-0.7.41 to foo-0.7.42, just
repeat the installation in the same way, altering the symlink prefix/local/PKG/foo
to point to foo-0.7.42 instead and running lsync again.
lsync will automatically update symlinks, creating new links
if required and removing outdated dangling ones (see Listing 5).
As might be guessed, it is just as easy to go back to the old version
if the new one keeps dumping core or something. For an example of
such multiple unpackaged software installation, see Figure 3.
OpenPKG Release Engineering
A carefully crafted release process is part of the OpenPKG project,
and the fruits of the whole project are available to the public
according to open source standards. All sources (package specifications,
source patches, Web site sources, the handbook, this article, etc.)
are located in a publicly readable central CVS repository, which
can be browsed anonymously by conventional cvs commands or
through the Web site for added convenience. Additionally, all developer
commits to this repository are tracked and summarized with postings
to public mailing lists and public newsgroups. Participants can
easily follow all developments by subscribing to the list or reading
the newsgroup.
For stability and to reduce conflicts between development milestones,
OpenPKG has three release branches (which technically directly map
to CVS branches). These are OpenPKG-SOLID, OpenPKG-STABLE, and OpenPKG-CURRENT.
OpenPKG-SOLID is the security update branch of the last public OpenPKG
release. OpenPKG-STABLE is the stable branch from whose contents
the next public release is made. OpenPKG-CURRENT is the current
state of the development branch and contains packages of beta-grade
stability. In any case, the branch from which a package was built
can easily be determined by the OpenPKG RPM file name, because they
follow a consistent naming scheme: pkg-name-version-YYYYMMDD
(for CURRENT), pkg-name-version-N.YYYYMMDD (for N-STABLE),
pkg-name-version-N.M.X (for N.M.X-SOLID). Once such
a source RPM file is built, the new binary RPM file name contains
additional information, such as operating system, hardware, and
the OpenPKG instance prefix.
The OpenPKG developer team is very fast in keeping OpenPKG-CURRENT
packages up to date and in sync with the latest vendor versions.
This is possible because the versions of all externally available
vendor sources are automatically tracked on a daily basis. An OpenPKG
package for a new vendor software version is often available before
the software is even announced on Freshmeat.net.
Finally, OpenPKG takes security very seriously. Experience has
shown that "security through obscurity" does not work,
and that public disclosure leads to quicker and better solutions
to security problems. Thus, OpenPKG tries to release fixed packages
as quickly as possible when a vulnerability is discovered. The OpenPKG
security release and advisory process publishes official security
advisories in the security section of the Web site and on the mailing
lists.
Conclusion
OpenPKG is an open source software project founded by Cable &
Wireless Germany in November 2000. The implementation relies on
RPM 4 for its basic packaging mechanism, but offers more than RPM
alone. To meet its goal of becoming a modular and flexible UNIX
subsystem for cross-platform software packaging and installation,
OpenPKG includes tricky bootstrapping logic that installs a customized
implementation of RPM 4 on any of the supported target platforms.
OpenPKG has been in production use at Cable & Wireless Germany
since April 2001. Since its public release in January 2002, OpenPKG
users have profited from an increase of 220 to more than 400 software
packages. The project is continuously improved by a team of developers
who also daily update and add packages. The base of OpenPKG software
packages is expected to increase even more, partly because of the
ease of writing specifications and building packages. Most OpenPKG
users find it deceptively simple to build a basic package. New users
interested in such packaging can use the RSYNC example in this article
as a blueprint. Accordingly, package contributions are always appreciated
by the members of the OpenPKG project.
To make OpenPKG even more attractive, work is under way on a front
end, which will simplify and control the installation process according
to build and install dependencies. OpenPKG is also fulfilling plans
to satisfy the desktop user by offering X11-dependent packages for
Gtk, Qt, Gimp, Mozilla, and many others. For faster execution and
even more flexibility, a further enhanced run-command processor
is also under development. Shared library support is under investigation,
too. Lastly, we are looking forward to upgrading OpenPKG to use
the forthcoming RPM 4.1 version.
References
OpenPKG: http://www.openpkg.org/ ftp://ftp.openpkg.org/
OpenPKG Community Forums:
mailto:openpkg-users@openpkg.org
mailto:openpkg-dev@openpkg.org
nntp://news.openpkg.org/openpkg.users
nntp://news.openpkg.org/openpkg.dev
RPM http://www.rpm.org/ ftp://ftp.rpm.org/pub/rpm/
Ralf S. Engelschall is a computer scientist and Open Source
software hacker, leading the software development department at
Cable & Wireless Germany. He is the author of well-known software
like Apache mod_ssl, GNU Pth, and GNU Shtool and the founder of
Open Source software projects like OpenSSL, OpenPKG, and OSSP. He
can be contacted at: rse@engelschall.com.
Thomas Lotterer is a network professional and consultant working
as a UNIX software developer at Cable & Wireless Germany. He
gained experience in cross-platform system integration and software
distribution by working previously as a systems administrator and
technical trainer. Today, Thomas works actively on the OpenPKG and
OSSP projects. He can be contacted at: thomas@lotterer.net.
Michael Schloh von Bennewitz is a software engineer at Cable
& Wireless Germany, where he works on the network and user interface
logic of ISP tools and technologies. He is an active contributor
to both the OpenPKG and OSSP projects. With fingers blazing, Michael
listened to classical music while writing parts of this article
in order to go even faster. He can be contacted at: michael@schloh.com.
Christoph Schug is a senior UNIX systems administrator at Cable
& Wireless Germany. He leads the hosting department and is responsible
for all managed servers at the Munich data center. His revolutionary
ideas and visions often result in additional lines in Ralf's
TODO list. When not in the office, Christoph might be found in the
Alps steering the screaming and smoking tires of his Miata MX-5
roadster. He can be contacted at: chris@schug.net.
|