Tripwire
in the Enterprise: Integrating Tripwire into Big Brother
Elena Khan
I work for Adero, Inc., a start-up that specializes in global
caching of Web content. We first opened shop in Massachusetts two
years ago, moved a couple of times to bigger facilities, and finally
found a home in the Boston suburb of Waltham. As our company grew,
however, so did our need for intrusion detection. Our security team
recommended Tripwire, and the operations team (my group) was tasked
with implementing it on 200 machines (comprising four discrete functional
groups) that were already deployed worldwide.
This article describes the system I created for making Tripwire
administration across the enterprise as easy as possible. It was
designed for Adero's specific needs, which were three-fold:
1. Install Tripwire on production machines in the field.
2. Confirm that the builds were consistent between machines within
each functional group.
3. Integrate the running of Tripwire into an existing monitoring
system.
Before beginning this project, I tried to find a third-party solution
for using Tripwire in an enterprise, but an extensive Web search
produced nothing. The only product that came close to addressing
the problem was from Tripwire itself -- the "HQ Console".
When I evaluated the Console (Q4 of 2000), it was not robust enough
for our needs. As I continued to work on this project, I realized
that the dearth of ready-made solutions was a result of Tripwire's
being inherently "enterprise unfriendly". I will clarify
this perception as I explain what I did and the reasoning behind
it. I assume throughout that the reader is familiar with Tripwire
(www.tripwire.com), Big Brother (www.bb4.com), ssh,
and shell scripting.
One of the reasons that Tripwire is "enterprise unfriendly"
is its dependence on a static machine configuration. After Tripwire
is installed, it must be initialized, meaning that it creates a
"snapshot" of specified files and stores that snapshot
in a database file. Any subsequent change to a monitored file is
reported as a violation the next time Tripwire runs. To let Tripwire
know that a reported violation is okay, the database must be updated.
Specifically, I have to log on to the machine, run the update command:
tripwire -m u -r the_report_file_that_shows_the_change ,
approve the change shown on the screen, and then enter a password
to authorize the update. In my company's case, however, machine
configurations definitely do not remain static over time. To the contrary,
many files are changed as we add customers and features to our service.
Thus, the strict checking and the manual updates that make Tripwire
so secure are also obstacles to deploying it across many machines,
because any change mandates that I "touch" all affected
machines.
Policy Files
I circumvented this manual process by using, paradoxically, multiple
databases. In the preceding paragraph, I said that the database
holds a "shapshot" of files. In fact, that is the sole
job of the database. The snapshot itself is simply information about
a set of files, and its contents are governed by a policy file.
Figure 1 shows an excerpt from a text file that will become a policy
file for a Solaris machine. After the text file is created, it must
be encrypted. It is the encrypted file that must be present for
Tripwire to run. The text file should not be on any machine running
Tripwire because it shows exactly what is and is not monitored.
The excerpt shows that the root, /.ssh, and /etc (except
for four files) directories will be monitored. The letters following
the "t" sign represent the file attributes to monitor.
I divided the files on our machines into four policy groups based
on the likelihood that they would change:
1. System files (e.g., /etc/system), which rarely change;
2. Application files (most of these files are in /usr/local/
but also include any corresponding start-up scripts in the init.d/
directories) that often change;
3. Files that change frequently (e.g., /etc/shadow), which
will not be checked by Tripwire;
4. Files unique to a machine (e.g., /etc/hosts), which
"never" change.
Machines of the same type have the same policy and database files
for policies 1 and 2. Policy 3 files do not need either file. For
policy 4, machines have the same policy file, but each machine has
a different database. The point is that the same policy and database
files can be used on many machines. This fact becomes important
when system updates occur. I give an examle in the next section.
Configuration File
Besides the policy file, there is one other file that determines
Tripwire's behavior. It is the configuration file, an example
of which is shown in Figure 2. In my system, each policy has its
own configuration file, but the only differences between them are
the values for POLFILE, DBFILE, and REPORTFILE.
In Figure 2, the "p1" in the values shows that this file
is for policy 1. Just as with the policy file, the configuration
file also has to be encrypted. Tripwire uses one password (the "site
key") to encrypt the configuration and policy files, and it
uses another password (the "local key") to create the
encrypted database file. I chose to use the same site and local
keys across all machines, so the SITEKEYFILE and LOCALKEYFILE
values are the same in all the configuration files.
During the life of a machine, the configuration file never has
to change, but the policy and database files will change. Updating
these files, however, is straightforward. For example, if I want
to add a key to the /.ssh/authorized_keys file on all our
cache machines, I would use the following procedure:
1. Roll out the change.
2. Update the Tripwire database on only one machine.
3. Copy the updated database to the rest of the machines.
So, instead of the native Tripwire process of updating the database
on our 100 cache machines, I only have to manually update one machine.
Continuing the example, if I decide that I do not want to check
the /.ssh/ directory in policy 1, then I must change the
policy file. Any change to a policy file mandates a change to the
associated database, so I end up with two files to update. The process
for updating the two files, however, is the same; I just distribute
two files instead of one.
Changing policy 2 is exactly the same as changing policy 1, but
changing policy 4 is different because each machine has a unique
database. For instance, say that I am monitoring /etc/hosts
in policy 4, but I am not monitoring /etc/resolv.conf. If
I change /etc/hosts on a machine, I have to manually update
the database on that machine (just as for policy 1 or 2), but that
database is not copied to the other machines. If I add /etc/resolv.conf
to policy 4 (i.e., if I change the contents of policy 4), then I
have reached the worst-case scenario of my multi-policy scheme,
which means that the update procedure falls through to the native
Tripwire process. As with policy 1 and 2, the new policy file is
created on one machine and distributed to the others, but then I
must update each machine individually. Unfortunately, having policy
4 also means that each machine must be touched during the installation
procedure. The compensation for such an odious task, however, is
that (if the files in the policy are chosen wisely), the task will
seldom be repeated.
Installation
Although the manual aspect makes the initial install more tedious,
the installation procedure itself is quite easy -- there are
only a handful of steps. Assuming that ssh is already set
up on the remote machines, step 1 is setting up an identity key.
Once the key is set up, the ssh program no longer asks for
a password when logging into the machine. The importance of the
identity key will be clear later in the article. Our machines have
the same authorized_keys file, so the command I use is:
[root@interrogator src]# scp authorized_keys <remote_machine>:/.ssh/
Steps 2 through 4 set up the Tripwire infrastructure on the remote
machine:
[root@interrogator src]# scp TSS.tar <remote_machine>:/opt
[root@remote_machine opt]# tar xvf TSS.tar
[root@remote_machine opt]# rm TSS.tar
The tarball only contains directories and the four Tripwire binaries:
[root@interrogator src]# tar tvf TSS.tar
drwx------ root/other 0 2000-11-21 04:59:31 TSS/
drwx------ root/other 0 2000-11-21 04:58:59 TSS/bin/
-r-x------ root/other 2149312 1999-12-05 22:24:15 TSS/bin/siggen
-r-x------ root/other 2329944 1999-12-05 22:20:10 TSS/bin/twprint
-r-x------ root/other 2519780 1999-12-05 22:23:07 TSS/bin/twadmin
-r-x------ root/other 2801324 1999-12-05 22:18:35 TSS/bin/tripwire
drwx------ root/other 0 2000-11-21 05:00:07 TSS/policy/
drwx------ root/other 0 2000-11-20 19:09:22 TSS/report/
drwx------ root/other 0 2000-11-21 04:59:09 TSS/db/
drwx------ root/other 0 2000-11-21 04:59:16 TSS/key/
Step 5 copies over the local and site keys:
[root@interrogator src]# scp local.key site.key \
<remote_machine>:/opt/TSS/key/
Steps 6 through 8 copy over the configuration, policy, and database
files:
[root@interrogator rb]# scp p1.cfg p2.cfg p4.cfg \
<remote_machine>:/opt/TSS/bin
[root@interrogator rb]# scp p1.pol p2.pol p4.pol \
<remote_machine>:/opt/TSS/policy
[root@interrogator rb]# scp p1.twd p2.twd \
<remote_machine>:/opt/TSS/db
Steps 9 and 10 check the machine against the first two databases:
[root@machine bin]# ./tripwire -m c -c p1.cfg
[root@machine bin]# ./tripwire -m c -c p2.cfg
Any inconsistencies should be addressed, and these commands should
be repeated until there are no discrepancies. Step 12 creates the
database for policy 4:
[root@machine bin]# ./tripwire -m i -c p4.cfg
Because files in policy 4 are different on each machine, step 11 should
be a manual check of the files in this policy before the database
is generated.
Figure 3 shows an example of a Tripwire run immediately after
installation. The colors will be explained later. The discrepancies
between the "machine1" filesystem and the one outlined
in its database are listed in the "Added" and "Modified"
sections. After bringing "machine1" into line with the
database, the output of Tripwire looks like Figure 4. When all the
machines report "No violations", I have confirmed that
the machines' filesystems are identical.
Since all our machines were already operating in the field, though,
none of them could be the "model" on which the policy
and database files were created. After all, it was possible that
their security had already been breached, and I wanted to be sure
that my file system snapshots were created on a pristine machine.
Instead of a machine in the field, I used freshly built, in-house
machines as the models for each machine type; that means I have
four machines sitting in our lab that are used solely for policy
and database creation and updates. Taking together the facts that
the models are on a private network and that each remote machine
should be identical to any other in its group, any discrepancy between
any of the machines (both at installation and on an ongoing basis)
indicates a possible violation.
Monitoring
Having described my solution for getting around the "snapshot"
model of Tripwire, I will now focus on the other obstacle to running
Tripwire in an enterprise -- monitoring. Tripwire's optional
notification procedure is email, and it has two settings: either
send email only when a violation is found or send email after every
run. Having the remote machines send email is problematic, however.
First, the email recipient is set up in the policy file. If I want
to change the recipient of the emails, I have to change the policy
file (and therefore the database file) on all the affected machines.
Second, if I choose the "every run" option, then I will
be inundated with email, and my team will have to look at each one
to determine whether there is a violation. Third, if I do not choose
the "every run" option, there is no way to know whether
Tripwire is still working because a lack of email could mean that
no violations have been found. I opted out of this catch-22 by disabling
the email option.
Tripwire's response to the monitoring dilemma is the company's
HQ Console product -- software that can be used to run Tripwire
remotely and to produce alerts. Unfortunately, it did not handle
multiple policy files, which rendered it useless to me. Furthermore,
it did not copy over and store reports (it only reported and stored
the exact errors); and it is a standalone product (we wanted something
that integrated into our existing system). Finally, it was only
available for Windows NT, and I wanted a UNIX solution because our
department is infinitely stronger in managing a UNIX box for security
purposes. To their credit, Tripwire did send representatives to
discuss our needs and to ask for input on improving the Console.
Although the representatives were more than willing to help, we
needed a solution quickly and could not afford to wait for a software
revision. It was then clear that we would have to create our own
monitoring solution to encompass all our requirements.
Running Tripwire Remotely
The HQ Console, however, did introduce a very useful idea --
running Tripwire remotely. I adopted this idea and set up an intermediate
machine, which sits between the remote machines and the monitoring
system. This machine, named "interrogator", takes the
responsibility of running Tripwire and sending updates away from
the remote machines. By centralizing these two jobs, I only have
to keep interrogator and the monitoring machine healthy and safe;
in return, they will notify me of problems on the remote machines.
The "interrogator" performs its jobs via three scripts:
do_tw.sh (Listing 1) checks policy 1, 2, and 4 files on the
remote machines; do_files.sh (Listing 2) checks policy 3
files; and check_status.sh (Listing 3) sends the status of
the machines to the monitoring system. The outline of the system
is shown in Figure 5. (Listings for this article are available from
the Sys Admin Web site: http://www.sysadminmag.com.)
The "BB Display" is our Big Brother machine. Our team
leader suggested Big Brother as the monitoring system because we
were more familiar with it than our other monitoring system. Indeed,
I had set up BB for our department a year ago precisely because
our other system was difficult to use, configure, and understand.
In contrast to its monitoring partner, BB has proven itself to be
simple and elegant. Its native display is a Web page, and by setting
up a certificate and password authentication, we have secure access
to its display outside the office. Except for a few binaries, BB
is all shell scripts, so it's just a matter of reading the
scripts to find out exactly what the program is doing. (The mailing
list archives are also an excellent source of additional information.)
In the spirit of Big Brother, I chose to make my three programs
shell scripts as well.
BB displays its information in columns; several examples of its
interface can be accessed from: http://www.bb4.com/demo.html.
The Tripwire information from the remote machines must have its
own column, of course, and adding an additional column is simple.
From interrogator, I just use the following command within the check_status.sh
script:
$BB $BBDISPLAY $LINE ,
where $BB is the "bb" binary, $BBDISPLAY is
the name or IP of the display machine, and $LINE is the actual
information that the display will use. Interrogator already had Big
Brother installed, so my value of $BB is /opt/bb/bin/bb.
The $LINE variable takes the form:
status $NAME.$TEST $COLOR `date` $MESSAGE ,
where $NAME is the name of the machine (with commas instead
of periods), $TEST is the column name (I use tw), $COLOR
is "red" or "green" (indicating the machine's
status), and $MESSAGE is text that will be displayed on the
display's "drill-down" Web page. The presence of the
MESSAGE variable is one example of how BB's thoughtful
design was an excellent fit with our needs. I wanted the ability to
determine the severity of a violation directly from the monitoring
screen, so I use the MESSAGE variable to send excerpts of the
Tripwire output to the display. Figure 6, an example of the drill-down
page for a violation, shows how it works. In this example, I can clearly
see that the known_hosts file has been modified. In my case,
this error is usually not serious, and I know that I should first
investigate who in our department has forgotten about Tripwire before
I start sounding the security alarm. Figure 7 shows an example of
a machine whose latest Tripwire run found no problems. In these cases,
I use the MESSAGE variable to send "Report created on:"
information to the display. As a good monitoring system should, BB
also provides the alerts.
There are two benefits to centralizing notification responsibilities
to BB. First, while Tripwire only has email alerts, BB provides
several methods of notification. Second, when changing alert recipients,
I only have to change configuration files on the one BB machine
instead of on each remote machine.
Interrogator
BB, however, depends upon interrogator for its information, so
I will now return to describing interrogator -- specifically,
how it is set up and how it performs its jobs. Its directory structure
is shown in Figure 8. There is no setup script, so the directories
must be created manually. The uppercase, boldface names (i.e., "ROOTDIR")
are the variables used in the scripts to represent the directories.
The file listings shown are examples of what lies in each directory.
The system is based on machine types. Figure 8 shows that four
types are currently set up to be checked: cache machines (type 1),
type 2 machines, type 3 machines, and type 4 machines. Each type
has a set of configuration, policy, and database files, which are
stored in the src/ directory tree. The src/cache/
directory shows the full set of configuration files for the "cache"
machines. Each type also has a list of member machines; these lists
are stored in the TYPEDIR directory. Under TYPEDIR
are the directories where the files for policy 3 are stored. Finally,
the TEMPLATEDIR stores information about how the output of
a Tripwire run should look for each machine type.
The first script, do_tw.sh, is responsible for running
Tripwire on the remote machines. It takes three parameters and is
run as:
do_tw.sh machine1.company.com cache p1 ,
where machine1.company.com is the name of the machine, cache
is the type of machine, and p1 is the policy to check. After
checking the parameters and pinging the machine, the script logs on
to the remote machine as root using the ssh identity key that
was set up during installation. The key must have root access because
Tripwire must be run as root. After logging in, it runs Tripwire via
the command:
tripwire -m c -c X.cfg ,
where X is the third parameter (the policy to check) submitted to
do_tw.sh. The output is redirected to the REPORTDIR
directory and saved with the .asc suffix. Figure 8 shows example
reports under the machine1/ and machine2/ directories.
The script then extracts the report's core information from
the surrounding extraneous information and puts it into a temporary
file in the TMPDIR directory. (Temporary files are saved
with the job number as the suffix.) In Figures 3 and 4, the orange
text represents the information that is ignored, and the black text
represents the information that is saved to the temporary file.
The contents of the temporary file are then compared (via the diff
command) to the contents of a template.
Which template to use is determined by the second and third parameters
to the do_tw.sh script. As shown in Figure 8, the templates
are named template.report.<machine_type>.<policy_number>.
The templates themselves are simply the core information of a Tripwire
report from a clean run and must be created manually. That is, the
core information from the model machine's output must be saved
as a template and transferred to the intermediate machine's
TEMPLATEDIR. The black text in Figure 4, for example, could
be the contents of a template file, but the black text in Figure
3 could not.
Assume for the moment that the black text shown in Figure 4 is
a template. Assume, also, that Tripwire runs on a remote machine
and the output looks like Figure 3. The diff between the
report core and the template will produce output, which means that
there is a violation. Each time do_tw.sh runs, it produces
two files in STATUSDIR that together describe the state of
the remote machine; one file has a .msg suffix and the other
has .status. The output from the diff is sent to the
.msg file. Also, the word "red" is written to the
.status file. If there is no output from diff, then
the word "green" is written to the .status file.
To the .msg file goes the phrase "No violations."
and the time of the run. Figures 6 and 7 are examples of violations
and a clean run, respectively.
When there is a violation, the script logs into the remote machine
a second time and copies over the official Tripwire report file,
which has the suffix .twr. This report has five levels of
detail to choose from. Figure 2 shows that I use level 3, which
contains minimal but exact information about how the files are in
violation. The .twr report file is also saved in the REPORTDIR.
In Figure 8, there is a .twr file in the machine4/
directory, showing that there is a violation in policy 1.
Policy 4 is just a tiny bit different. Notice that there is a
line in Figures 3 and 4 that says "Total objects scanned:".
For policies 1 and 2, this number should be the same on all the
machines because policies 1 and 2 check generic file systems. For
policy 4, this number may be different. Thus, if the script is checking
policy 4, it treats this line as extraneous information and takes
it out.
The sister script to do_tw.sh is do_files.sh, which
checks policy 3 files. It takes two parameters and is run as:
do_files.sh machine1.company.com cache ,
where cache is the machine type. Unlike do_tw.sh, this
script can check all machines of a given type, so the command:
do_files.sh all cache
will check all the machines in the cache.list file.
As previously mentioned, do_files.sh does not use Tripwire
to check the files. Rather, I have traded some of the extra security
of Tripwire checking for the ease of immediately changing files.
The script compares an ls -l listing (minus the date and
time fields) and the md5sum output of the remote files to
local master copies, whose names represent absolute paths on the
remote machine. For example, the /etc/shadow file is named:
etc-^-shadow
The script takes a directory listing from the appropriate subdirectory
under TYPEDIR, translates the separators into slashes, and
adds a leading slash. After this step, it has a list of files to check
on the remote machine. Then the script sends ls -l (using the
real pathname) and md5sum information for the master files
to REPORTDIR. An example of this file is in the p3/
directory in Figure 8. Because of the full listing, the user and group
names must be the same for the master file as for the remote machine,
but the UID and GID do not have to be the same. The next step is logging
into the remote machine and sending the ls and md5sum
information to a temporary file. The comparison method is the same
as for do_tw.sh; diff compares the report and temporary
files. Again, output from diff goes to the .msg file
in STATUSDIR and a "red" or "green" status
is sent to the .status file.
I also check Tripwire's configuration and policy files with
this script to ensure they have not been altered. Taking together
this fact with the template checking and the inclusion of Tripwire's
binaries in policy 2, I ensure that Tripwire itself is not corrupted
on the remote machines.
Since the first two scripts take parameters, I created wrappers
to save some typing. For do_tw.sh, there are three wrappers
-- one each for p1, p2, and p4. They take the form:
for MACHINE in `cat $MACHINE_LIST` ; do
/opt/tw/scripts/do_tw.sh $MACHINE $MACHINE_TYPE $POLICY
done
For do_files.sh, the wrapper reads:
/opt/tw/scripts/do_files.sh all $MACHINE_TYPE
The third script takes no parameters. It simply runs as:
check_status.sh
As explained above, this script sends status information to BB using
the $LINE variable. To find out which machines to report on,
it looks in STATUSDIR for all files ending in .bb and
parses the filenames. Figure 8 shows the relevant files for "machine1".
After the script creates the machine list, it recreates the .bb
file with information from the .msg and .status files.
If any of the .status files contain the word "red",
the $COLOR variable in $LINE is set to red. The
contents of the .msg files are concatenated and sent over as
the $MESSAGE contents. Our BB machine rewrites its Web pages
every five minutes, so within five minutes of running check_status.sh,
the new information is on the display, and the alerts are sent out.
BB's information is considered stale (and its display turns
purple) if it is not updated within 30 minutes. To keep the status
information fresh, I run check_status.sh every 20 minutes
from cron. Instead of having a cron job on each of
the remote machines, I have a single cron file on interrogator,
thus centralizing the frequency of the scripts' running. Root's
cron jobs on interrogator are:
0 0,8,16 * * * /opt/tw/scripts/do_tw.wrapper-p1.sh > /dev/null
0 2,10,18 * * * /opt/tw/scripts/do_tw.wrapper-p2.sh > /dev/null
0 4,12,20 * * * /opt/tw/scripts/do_files.wrapper.sh > /dev/null
0 6,14,22 * * * /opt/tw/scripts/do_tw.wrapper-p4.sh > /dev/null
0,15,30,45 * * * * /opt/tw/scripts/check_status.sh
Although minimal, do_tw.sh and do_files.sh do produce
output, hence the redirection to /dev/null. Figure 9 shows
examples of good and bad output from the two scripts.
Conclusion
We have been using this system for several months now. Granted,
it is far from perfect. There are three areas in particular that
need improvement. First, there is no inode checking inherent in
this system. If the scripts are run very often, they will produce
a lot of reports, which in turn will deplete the filesystem's
inodes. As I found out, this system does produce reactive alerts
regarding this "device out of space" system error, but
periodic, proactive log removal would be more helpful. Second, there
should be a better way to check policy 4 files. Although I am not
certain of how it should be done, I am certain that I want to avoid
the possibility of logging onto every box if I change the policy.
Third, the do_tw.sh script should accept an "all"
parameter. There are also three outstanding items on the "nice-to-have"
list: variable number of policies per machine type (currently, all
types must have the same number of policies); skipping a machine
if it is not in the known_hosts file (to bypass hanging at
the "do you want to connect" question); and choosing which
attributes to check for policy 3 (currently, checking ls
and md5sum are the only options).
Even with its lack of refinement, though, this system has been
a very useful tool for bringing centralized Tripwire monitoring
into our enterprise. Our machines are checked by Tripwire at least
three times a day, and all my department has to do is look at the
BB display if we are curious or wait for an alert. I have taken
my department out of the checking loop. The only time we have to
log on to a machine is to investigate a violation.
I hope this article will provide systems administrators with a
springboard (or template) for including Tripwire monitoring in their
own enterprises.
Elena Khan graduated from college in 1992 without knowing how
to use a computer. She started using computers at her first job,
where she became something of an expert in WordStar. In 1996 she
took a vocational class to learn how to fix computers and landed
a job as a junior sys admin soon afer and she's been a sys
admin ever since. She can be reached at: ekhan@adero.com
or elena.khan@usa.net.
|