Cover V10, I08

Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9


aug2001.tar


Tripwire in the Enterprise: Integrating Tripwire into Big Brother

Elena Khan

I work for Adero, Inc., a start-up that specializes in global caching of Web content. We first opened shop in Massachusetts two years ago, moved a couple of times to bigger facilities, and finally found a home in the Boston suburb of Waltham. As our company grew, however, so did our need for intrusion detection. Our security team recommended Tripwire, and the operations team (my group) was tasked with implementing it on 200 machines (comprising four discrete functional groups) that were already deployed worldwide.

This article describes the system I created for making Tripwire administration across the enterprise as easy as possible. It was designed for Adero's specific needs, which were three-fold:

1. Install Tripwire on production machines in the field.

2. Confirm that the builds were consistent between machines within each functional group.

3. Integrate the running of Tripwire into an existing monitoring system.

Before beginning this project, I tried to find a third-party solution for using Tripwire in an enterprise, but an extensive Web search produced nothing. The only product that came close to addressing the problem was from Tripwire itself -- the "HQ Console". When I evaluated the Console (Q4 of 2000), it was not robust enough for our needs. As I continued to work on this project, I realized that the dearth of ready-made solutions was a result of Tripwire's being inherently "enterprise unfriendly". I will clarify this perception as I explain what I did and the reasoning behind it. I assume throughout that the reader is familiar with Tripwire (www.tripwire.com), Big Brother (www.bb4.com), ssh, and shell scripting.

One of the reasons that Tripwire is "enterprise unfriendly" is its dependence on a static machine configuration. After Tripwire is installed, it must be initialized, meaning that it creates a "snapshot" of specified files and stores that snapshot in a database file. Any subsequent change to a monitored file is reported as a violation the next time Tripwire runs. To let Tripwire know that a reported violation is okay, the database must be updated. Specifically, I have to log on to the machine, run the update command:

tripwire -m u -r the_report_file_that_shows_the_change ,
approve the change shown on the screen, and then enter a password to authorize the update. In my company's case, however, machine configurations definitely do not remain static over time. To the contrary, many files are changed as we add customers and features to our service. Thus, the strict checking and the manual updates that make Tripwire so secure are also obstacles to deploying it across many machines, because any change mandates that I "touch" all affected machines.

Policy Files

I circumvented this manual process by using, paradoxically, multiple databases. In the preceding paragraph, I said that the database holds a "shapshot" of files. In fact, that is the sole job of the database. The snapshot itself is simply information about a set of files, and its contents are governed by a policy file. Figure 1 shows an excerpt from a text file that will become a policy file for a Solaris machine. After the text file is created, it must be encrypted. It is the encrypted file that must be present for Tripwire to run. The text file should not be on any machine running Tripwire because it shows exactly what is and is not monitored. The excerpt shows that the root, /.ssh, and /etc (except for four files) directories will be monitored. The letters following the "t" sign represent the file attributes to monitor.

I divided the files on our machines into four policy groups based on the likelihood that they would change:

1. System files (e.g., /etc/system), which rarely change;

2. Application files (most of these files are in /usr/local/ but also include any corresponding start-up scripts in the init.d/ directories) that often change;

3. Files that change frequently (e.g., /etc/shadow), which will not be checked by Tripwire;

4. Files unique to a machine (e.g., /etc/hosts), which "never" change.

Machines of the same type have the same policy and database files for policies 1 and 2. Policy 3 files do not need either file. For policy 4, machines have the same policy file, but each machine has a different database. The point is that the same policy and database files can be used on many machines. This fact becomes important when system updates occur. I give an examle in the next section.

Configuration File

Besides the policy file, there is one other file that determines Tripwire's behavior. It is the configuration file, an example of which is shown in Figure 2. In my system, each policy has its own configuration file, but the only differences between them are the values for POLFILE, DBFILE, and REPORTFILE. In Figure 2, the "p1" in the values shows that this file is for policy 1. Just as with the policy file, the configuration file also has to be encrypted. Tripwire uses one password (the "site key") to encrypt the configuration and policy files, and it uses another password (the "local key") to create the encrypted database file. I chose to use the same site and local keys across all machines, so the SITEKEYFILE and LOCALKEYFILE values are the same in all the configuration files.

During the life of a machine, the configuration file never has to change, but the policy and database files will change. Updating these files, however, is straightforward. For example, if I want to add a key to the /.ssh/authorized_keys file on all our cache machines, I would use the following procedure:

1. Roll out the change.

2. Update the Tripwire database on only one machine.

3. Copy the updated database to the rest of the machines.

So, instead of the native Tripwire process of updating the database on our 100 cache machines, I only have to manually update one machine. Continuing the example, if I decide that I do not want to check the /.ssh/ directory in policy 1, then I must change the policy file. Any change to a policy file mandates a change to the associated database, so I end up with two files to update. The process for updating the two files, however, is the same; I just distribute two files instead of one.

Changing policy 2 is exactly the same as changing policy 1, but changing policy 4 is different because each machine has a unique database. For instance, say that I am monitoring /etc/hosts in policy 4, but I am not monitoring /etc/resolv.conf. If I change /etc/hosts on a machine, I have to manually update the database on that machine (just as for policy 1 or 2), but that database is not copied to the other machines. If I add /etc/resolv.conf to policy 4 (i.e., if I change the contents of policy 4), then I have reached the worst-case scenario of my multi-policy scheme, which means that the update procedure falls through to the native Tripwire process. As with policy 1 and 2, the new policy file is created on one machine and distributed to the others, but then I must update each machine individually. Unfortunately, having policy 4 also means that each machine must be touched during the installation procedure. The compensation for such an odious task, however, is that (if the files in the policy are chosen wisely), the task will seldom be repeated.

Installation

Although the manual aspect makes the initial install more tedious, the installation procedure itself is quite easy -- there are only a handful of steps. Assuming that ssh is already set up on the remote machines, step 1 is setting up an identity key. Once the key is set up, the ssh program no longer asks for a password when logging into the machine. The importance of the identity key will be clear later in the article. Our machines have the same authorized_keys file, so the command I use is:

[root@interrogator src]# scp authorized_keys <remote_machine>:/.ssh/
Steps 2 through 4 set up the Tripwire infrastructure on the remote machine:

[root@interrogator src]# scp TSS.tar <remote_machine>:/opt
[root@remote_machine opt]# tar xvf TSS.tar
[root@remote_machine opt]# rm TSS.tar
The tarball only contains directories and the four Tripwire binaries:

[root@interrogator src]# tar tvf TSS.tar
drwx------ root/other        0 2000-11-21 04:59:31 TSS/
drwx------ root/other        0 2000-11-21 04:58:59 TSS/bin/
-r-x------ root/other  2149312 1999-12-05 22:24:15 TSS/bin/siggen
-r-x------ root/other  2329944 1999-12-05 22:20:10 TSS/bin/twprint
-r-x------ root/other  2519780 1999-12-05 22:23:07 TSS/bin/twadmin
-r-x------ root/other  2801324 1999-12-05 22:18:35 TSS/bin/tripwire
drwx------ root/other        0 2000-11-21 05:00:07 TSS/policy/
drwx------ root/other        0 2000-11-20 19:09:22 TSS/report/
drwx------ root/other        0 2000-11-21 04:59:09 TSS/db/
drwx------ root/other        0 2000-11-21 04:59:16 TSS/key/
Step 5 copies over the local and site keys:

[root@interrogator src]# scp local.key site.key \
  <remote_machine>:/opt/TSS/key/
Steps 6 through 8 copy over the configuration, policy, and database files:

[root@interrogator rb]# scp p1.cfg p2.cfg p4.cfg \
  <remote_machine>:/opt/TSS/bin
[root@interrogator rb]# scp p1.pol p2.pol p4.pol \
  <remote_machine>:/opt/TSS/policy
[root@interrogator rb]# scp p1.twd p2.twd \
  <remote_machine>:/opt/TSS/db
Steps 9 and 10 check the machine against the first two databases:

[root@machine bin]# ./tripwire -m c -c p1.cfg
[root@machine bin]# ./tripwire -m c -c p2.cfg
Any inconsistencies should be addressed, and these commands should be repeated until there are no discrepancies. Step 12 creates the database for policy 4:

[root@machine bin]# ./tripwire -m i -c p4.cfg
Because files in policy 4 are different on each machine, step 11 should be a manual check of the files in this policy before the database is generated.

Figure 3 shows an example of a Tripwire run immediately after installation. The colors will be explained later. The discrepancies between the "machine1" filesystem and the one outlined in its database are listed in the "Added" and "Modified" sections. After bringing "machine1" into line with the database, the output of Tripwire looks like Figure 4. When all the machines report "No violations", I have confirmed that the machines' filesystems are identical.

Since all our machines were already operating in the field, though, none of them could be the "model" on which the policy and database files were created. After all, it was possible that their security had already been breached, and I wanted to be sure that my file system snapshots were created on a pristine machine. Instead of a machine in the field, I used freshly built, in-house machines as the models for each machine type; that means I have four machines sitting in our lab that are used solely for policy and database creation and updates. Taking together the facts that the models are on a private network and that each remote machine should be identical to any other in its group, any discrepancy between any of the machines (both at installation and on an ongoing basis) indicates a possible violation.

Monitoring

Having described my solution for getting around the "snapshot" model of Tripwire, I will now focus on the other obstacle to running Tripwire in an enterprise -- monitoring. Tripwire's optional notification procedure is email, and it has two settings: either send email only when a violation is found or send email after every run. Having the remote machines send email is problematic, however. First, the email recipient is set up in the policy file. If I want to change the recipient of the emails, I have to change the policy file (and therefore the database file) on all the affected machines. Second, if I choose the "every run" option, then I will be inundated with email, and my team will have to look at each one to determine whether there is a violation. Third, if I do not choose the "every run" option, there is no way to know whether Tripwire is still working because a lack of email could mean that no violations have been found. I opted out of this catch-22 by disabling the email option.

Tripwire's response to the monitoring dilemma is the company's HQ Console product -- software that can be used to run Tripwire remotely and to produce alerts. Unfortunately, it did not handle multiple policy files, which rendered it useless to me. Furthermore, it did not copy over and store reports (it only reported and stored the exact errors); and it is a standalone product (we wanted something that integrated into our existing system). Finally, it was only available for Windows NT, and I wanted a UNIX solution because our department is infinitely stronger in managing a UNIX box for security purposes. To their credit, Tripwire did send representatives to discuss our needs and to ask for input on improving the Console. Although the representatives were more than willing to help, we needed a solution quickly and could not afford to wait for a software revision. It was then clear that we would have to create our own monitoring solution to encompass all our requirements.

Running Tripwire Remotely

The HQ Console, however, did introduce a very useful idea -- running Tripwire remotely. I adopted this idea and set up an intermediate machine, which sits between the remote machines and the monitoring system. This machine, named "interrogator", takes the responsibility of running Tripwire and sending updates away from the remote machines. By centralizing these two jobs, I only have to keep interrogator and the monitoring machine healthy and safe; in return, they will notify me of problems on the remote machines. The "interrogator" performs its jobs via three scripts: do_tw.sh (Listing 1) checks policy 1, 2, and 4 files on the remote machines; do_files.sh (Listing 2) checks policy 3 files; and check_status.sh (Listing 3) sends the status of the machines to the monitoring system. The outline of the system is shown in Figure 5. (Listings for this article are available from the Sys Admin Web site: http://www.sysadminmag.com.)

The "BB Display" is our Big Brother machine. Our team leader suggested Big Brother as the monitoring system because we were more familiar with it than our other monitoring system. Indeed, I had set up BB for our department a year ago precisely because our other system was difficult to use, configure, and understand. In contrast to its monitoring partner, BB has proven itself to be simple and elegant. Its native display is a Web page, and by setting up a certificate and password authentication, we have secure access to its display outside the office. Except for a few binaries, BB is all shell scripts, so it's just a matter of reading the scripts to find out exactly what the program is doing. (The mailing list archives are also an excellent source of additional information.) In the spirit of Big Brother, I chose to make my three programs shell scripts as well.

BB displays its information in columns; several examples of its interface can be accessed from: http://www.bb4.com/demo.html. The Tripwire information from the remote machines must have its own column, of course, and adding an additional column is simple. From interrogator, I just use the following command within the check_status.sh script:

$BB $BBDISPLAY $LINE ,
where $BB is the "bb" binary, $BBDISPLAY is the name or IP of the display machine, and $LINE is the actual information that the display will use. Interrogator already had Big Brother installed, so my value of $BB is /opt/bb/bin/bb. The $LINE variable takes the form:

status $NAME.$TEST $COLOR `date` $MESSAGE ,
where $NAME is the name of the machine (with commas instead of periods), $TEST is the column name (I use tw), $COLOR is "red" or "green" (indicating the machine's status), and $MESSAGE is text that will be displayed on the display's "drill-down" Web page. The presence of the MESSAGE variable is one example of how BB's thoughtful design was an excellent fit with our needs. I wanted the ability to determine the severity of a violation directly from the monitoring screen, so I use the MESSAGE variable to send excerpts of the Tripwire output to the display. Figure 6, an example of the drill-down page for a violation, shows how it works. In this example, I can clearly see that the known_hosts file has been modified. In my case, this error is usually not serious, and I know that I should first investigate who in our department has forgotten about Tripwire before I start sounding the security alarm. Figure 7 shows an example of a machine whose latest Tripwire run found no problems. In these cases, I use the MESSAGE variable to send "Report created on:" information to the display. As a good monitoring system should, BB also provides the alerts.

There are two benefits to centralizing notification responsibilities to BB. First, while Tripwire only has email alerts, BB provides several methods of notification. Second, when changing alert recipients, I only have to change configuration files on the one BB machine instead of on each remote machine.

Interrogator

BB, however, depends upon interrogator for its information, so I will now return to describing interrogator -- specifically, how it is set up and how it performs its jobs. Its directory structure is shown in Figure 8. There is no setup script, so the directories must be created manually. The uppercase, boldface names (i.e., "ROOTDIR") are the variables used in the scripts to represent the directories. The file listings shown are examples of what lies in each directory.

The system is based on machine types. Figure 8 shows that four types are currently set up to be checked: cache machines (type 1), type 2 machines, type 3 machines, and type 4 machines. Each type has a set of configuration, policy, and database files, which are stored in the src/ directory tree. The src/cache/ directory shows the full set of configuration files for the "cache" machines. Each type also has a list of member machines; these lists are stored in the TYPEDIR directory. Under TYPEDIR are the directories where the files for policy 3 are stored. Finally, the TEMPLATEDIR stores information about how the output of a Tripwire run should look for each machine type.

The first script, do_tw.sh, is responsible for running Tripwire on the remote machines. It takes three parameters and is run as:

do_tw.sh machine1.company.com cache p1 ,
where machine1.company.com is the name of the machine, cache is the type of machine, and p1 is the policy to check. After checking the parameters and pinging the machine, the script logs on to the remote machine as root using the ssh identity key that was set up during installation. The key must have root access because Tripwire must be run as root. After logging in, it runs Tripwire via the command:

tripwire -m c -c X.cfg ,
where X is the third parameter (the policy to check) submitted to do_tw.sh. The output is redirected to the REPORTDIR directory and saved with the .asc suffix. Figure 8 shows example reports under the machine1/ and machine2/ directories.

The script then extracts the report's core information from the surrounding extraneous information and puts it into a temporary file in the TMPDIR directory. (Temporary files are saved with the job number as the suffix.) In Figures 3 and 4, the orange text represents the information that is ignored, and the black text represents the information that is saved to the temporary file. The contents of the temporary file are then compared (via the diff command) to the contents of a template.

Which template to use is determined by the second and third parameters to the do_tw.sh script. As shown in Figure 8, the templates are named template.report.<machine_type>.<policy_number>. The templates themselves are simply the core information of a Tripwire report from a clean run and must be created manually. That is, the core information from the model machine's output must be saved as a template and transferred to the intermediate machine's TEMPLATEDIR. The black text in Figure 4, for example, could be the contents of a template file, but the black text in Figure 3 could not.

Assume for the moment that the black text shown in Figure 4 is a template. Assume, also, that Tripwire runs on a remote machine and the output looks like Figure 3. The diff between the report core and the template will produce output, which means that there is a violation. Each time do_tw.sh runs, it produces two files in STATUSDIR that together describe the state of the remote machine; one file has a .msg suffix and the other has .status. The output from the diff is sent to the .msg file. Also, the word "red" is written to the .status file. If there is no output from diff, then the word "green" is written to the .status file. To the .msg file goes the phrase "No violations." and the time of the run. Figures 6 and 7 are examples of violations and a clean run, respectively.

When there is a violation, the script logs into the remote machine a second time and copies over the official Tripwire report file, which has the suffix .twr. This report has five levels of detail to choose from. Figure 2 shows that I use level 3, which contains minimal but exact information about how the files are in violation. The .twr report file is also saved in the REPORTDIR. In Figure 8, there is a .twr file in the machine4/ directory, showing that there is a violation in policy 1.

Policy 4 is just a tiny bit different. Notice that there is a line in Figures 3 and 4 that says "Total objects scanned:". For policies 1 and 2, this number should be the same on all the machines because policies 1 and 2 check generic file systems. For policy 4, this number may be different. Thus, if the script is checking policy 4, it treats this line as extraneous information and takes it out.

The sister script to do_tw.sh is do_files.sh, which checks policy 3 files. It takes two parameters and is run as:

do_files.sh machine1.company.com cache ,
where cache is the machine type. Unlike do_tw.sh, this script can check all machines of a given type, so the command:

do_files.sh all cache
will check all the machines in the cache.list file.

As previously mentioned, do_files.sh does not use Tripwire to check the files. Rather, I have traded some of the extra security of Tripwire checking for the ease of immediately changing files. The script compares an ls -l listing (minus the date and time fields) and the md5sum output of the remote files to local master copies, whose names represent absolute paths on the remote machine. For example, the /etc/shadow file is named:

etc-^-shadow
The script takes a directory listing from the appropriate subdirectory under TYPEDIR, translates the separators into slashes, and adds a leading slash. After this step, it has a list of files to check on the remote machine. Then the script sends ls -l (using the real pathname) and md5sum information for the master files to REPORTDIR. An example of this file is in the p3/ directory in Figure 8. Because of the full listing, the user and group names must be the same for the master file as for the remote machine, but the UID and GID do not have to be the same. The next step is logging into the remote machine and sending the ls and md5sum information to a temporary file. The comparison method is the same as for do_tw.sh; diff compares the report and temporary files. Again, output from diff goes to the .msg file in STATUSDIR and a "red" or "green" status is sent to the .status file.

I also check Tripwire's configuration and policy files with this script to ensure they have not been altered. Taking together this fact with the template checking and the inclusion of Tripwire's binaries in policy 2, I ensure that Tripwire itself is not corrupted on the remote machines.

Since the first two scripts take parameters, I created wrappers to save some typing. For do_tw.sh, there are three wrappers -- one each for p1, p2, and p4. They take the form:

for MACHINE in `cat $MACHINE_LIST` ; do
      /opt/tw/scripts/do_tw.sh $MACHINE $MACHINE_TYPE $POLICY
done
For do_files.sh, the wrapper reads:

/opt/tw/scripts/do_files.sh all $MACHINE_TYPE
The third script takes no parameters. It simply runs as:

check_status.sh
As explained above, this script sends status information to BB using the $LINE variable. To find out which machines to report on, it looks in STATUSDIR for all files ending in .bb and parses the filenames. Figure 8 shows the relevant files for "machine1". After the script creates the machine list, it recreates the .bb file with information from the .msg and .status files. If any of the .status files contain the word "red", the $COLOR variable in $LINE is set to red. The contents of the .msg files are concatenated and sent over as the $MESSAGE contents. Our BB machine rewrites its Web pages every five minutes, so within five minutes of running check_status.sh, the new information is on the display, and the alerts are sent out.

BB's information is considered stale (and its display turns purple) if it is not updated within 30 minutes. To keep the status information fresh, I run check_status.sh every 20 minutes from cron. Instead of having a cron job on each of the remote machines, I have a single cron file on interrogator, thus centralizing the frequency of the scripts' running. Root's cron jobs on interrogator are:

0 0,8,16 * * *       /opt/tw/scripts/do_tw.wrapper-p1.sh > /dev/null
0 2,10,18 * * *      /opt/tw/scripts/do_tw.wrapper-p2.sh > /dev/null
0 4,12,20 * * *      /opt/tw/scripts/do_files.wrapper.sh > /dev/null
0 6,14,22 * * *      /opt/tw/scripts/do_tw.wrapper-p4.sh > /dev/null
0,15,30,45 * * * *   /opt/tw/scripts/check_status.sh
Although minimal, do_tw.sh and do_files.sh do produce output, hence the redirection to /dev/null. Figure 9 shows examples of good and bad output from the two scripts.

Conclusion

We have been using this system for several months now. Granted, it is far from perfect. There are three areas in particular that need improvement. First, there is no inode checking inherent in this system. If the scripts are run very often, they will produce a lot of reports, which in turn will deplete the filesystem's inodes. As I found out, this system does produce reactive alerts regarding this "device out of space" system error, but periodic, proactive log removal would be more helpful. Second, there should be a better way to check policy 4 files. Although I am not certain of how it should be done, I am certain that I want to avoid the possibility of logging onto every box if I change the policy. Third, the do_tw.sh script should accept an "all" parameter. There are also three outstanding items on the "nice-to-have" list: variable number of policies per machine type (currently, all types must have the same number of policies); skipping a machine if it is not in the known_hosts file (to bypass hanging at the "do you want to connect" question); and choosing which attributes to check for policy 3 (currently, checking ls and md5sum are the only options).

Even with its lack of refinement, though, this system has been a very useful tool for bringing centralized Tripwire monitoring into our enterprise. Our machines are checked by Tripwire at least three times a day, and all my department has to do is look at the BB display if we are curious or wait for an alert. I have taken my department out of the checking loop. The only time we have to log on to a machine is to investigate a violation.

I hope this article will provide systems administrators with a springboard (or template) for including Tripwire monitoring in their own enterprises.

Elena Khan graduated from college in 1992 without knowing how to use a computer. She started using computers at her first job, where she became something of an expert in WordStar. In 1996 she took a vocational class to learn how to fix computers and landed a job as a junior sys admin soon afer and she's been a sys admin ever since. She can be reached at: ekhan@adero.com or elena.khan@usa.net.