Integrit for File Verification
Ed L. Cashin
Integrit is a free software tool that helps sys admins stay in
touch with and trust the files on their systems. When I was first
learning systems administration, our shop had an old Sun machine
("Butch") that functioned as our email server. At the
first staff meeting I attended, it was announced that a rogue process
had been spotted running on Butch.
It turned out that Butch had been compromised. Our sys admin found
the rootkit and the backdoor that the attacker had left, but as
I gained more responsibility for the care of Butch, I started to
wonder how many times it had happened before. What parts of the
system could I trust? What parts of Butch had been replaced with
Trojan horses?
Trojan horses are a very real part of many system break-ins. When
an attacker gains unauthorized root access to a system, that access
can be solidified and enhanced by replacing parts of the compromised
system with custom-made files. The replacements may cover the tracks
of the attacker or provide a backdoor that allows the attacker to
gain re-entry to the system.
A trojaned syslog daemon might fail to log the attacker's
actions. A trojaned find command might silently ignore a
rootkit, and a bogus crond might listen on port 666 for an
attacker's telnet session while carrying out cron's
normal duties.
This article describes my experiences with several tools designed
to help sys admins tell which files are to be trusted and which
files have unexpectedly changed. I then describe a new tool, integrit,
that I wrote to overcome some of the problems I encountered.
The Problem -- How Do You Know Your System Can Be Trusted?
Not knowing what parts of a system you can trust severely limits
your ability to determine the stability or security of the system.
I hated not knowing whether our mail server was a playground for
malicious account crackers, so on the systems I controlled, I tightened
security with tools like tcp_wrappers.
I also relied on file verification software to keep track of the
system's programs like find, crond, and ps,
that I wanted to trust. The file verification software would tell
me when the contents or attributes of any file had changed. I noticed
that many of the systems administrators I knew were not using file
integrity verification software, because they thought they didn't
have the time to learn how to use it, or because they thought that
the software was too hard to learn to use properly.
The first product I tried was Tripwire, a tool that came out of
Purdue University. The documentation associated with Tripwire really
opened my eyes to the issues raised by the challenge of determining
whether a file was to be trusted.
The Solution -- Know What Your System Files Look Like
Tripwire was then a commercial product but was available in an
"academic source release." I got to know Tripwire well
and used it to monitor the system files of several important new
servers. Immediately after installing the operating system, I would
install Tripwire.
By installing Tripwire right away, I could make a "snapshot"
of the system when I knew that all the files on the system were
in a trusted state. The snapshot is really a collection of cryptographic
checksums1 and file stat information for the machine's local
files. All the information is saved in a database. In Tripwire,
that database was just a plain text file.
I could regularly check the current state of system files against
the state that was recorded in the database. If everything was still
okay, I could update the database to reflect any benign changes
that had occurred since the original database had been created (e.g.,
if I had upgraded some software).
Of course, the whole point in this exercise is determining whether
the local system has been compromised. That means that in running
the check, it is impossible to trust the programs that reside on
a writable filesystem of the local machine. Thus, the Tripwire binaries
and databases were exported by read-only NFS from another, more
tightly-secured server.
Caveats
This kind of strategy, using a program that checks for changes
in important system files, should not be confused with a comprehensive
security plan. A file verification system is just a tool to be incorporated
into a well-rounded security strategy.
Note that because these programs run in user space, they must
trust the kernel of the host where they run. Some operating systems
offer kernel security levels that will help round out your security
model and enable you to trust the results that your file integrity
checker produces.
More Problems -- Complexity and Bloat
My colleagues were glad that I had taken on the challenge of Tripwire,
but they did not want any part of it. Tripwire was a complex system
written in C++, and they didn't know C++. It had many features
that I never used and lacked some features that I wanted.
I tried another file integrity checker called "aide",
the Advanced Intrusion Detection Environment, that had some interesting
features like supporting regular expressions in the configuration
file. Not all the features worked correctly, but I liked it. After
installing it on an important server, though, I noticed that it
was using most of the machine's memory and was still growing
in core. Unfortunately, it used more and more memory while it ran.
Neither product was going in the direction I wanted; I wanted
an integrity checker that had a small, or at least constant, memory
footprint, one that would be easy for my co-workers to learn, and
one that was convenient to use.
Another Solution: integrit
One day when I needed to install a file integrity checker on a
new system, I was trying to decide which one to install, and I was
getting frustrated. I thought, "All the software has to do
is create a database of checksums and file attributes! This shouldn't
be so hard!"
I then thought of ways to save memory. By using binary databases
instead of plain text, I could avoid keeping the database in memory
when checking the current files against the checksums and attributes
stored in a database.
Another consideration was the output format. Both Tripwire and
aide had two sections in the report they generated: one said "File
X has changed", and if you scrolled down to the bottom of the
report, you could get more information about exactly how file X
had changed.
By printing all the information about a changed file right away
instead of saving the details until the end of the report, I could
save more memory, preventing the runtime memory footprint from growing.
I never liked having to jump between the top and bottom of the report,
anyway.
I started designing a tool that would do the job without losing
sight of the fact that the job is not that hard. The name, "integrit",
is a bit like the name for the UN*X file creation routine, "creat".
It's based on the word, "integrity". It reflects
the old-school name for creat as it reflects the old UN*X idea that
a tool should do one thing, do it well, and do it in a way that
interacts well with a system of tools.
A New Tool
When I finally started seriously working on a design, I kept two
goals in mind: simplicity and conservative memory use. Happily,
these goals help each other -- avoiding feature bloat helped
cut memory usage.
Small Memory Footprint
The idea of using a binary database worried me a bit because binary
databases usually use a lot more disk space than the data itself
would take up. After some research, I found a database format that
matched integrit's needs exactly: cdb, by D.J. Bernstein,
the author of qmail. And cdb databases were acceptably small.
The Web page for cdb says:
cdb is a fast, reliable, simple package for creating and reading
constant databases.2
I found that cdb databases were about half as big as corresponding
Berkeley databases, and I believed Bernstein's assertion that
cdb is more reliable after looking at his code; he is one
of the most disciplined programmers I have come across. During file
checking, integrit must be able to look up a file's attributes
and checksum very quickly. By using cdb databases instead
of holding all file attributes and checksums in memory, we get fast
lookups and cut down on memory use.
A help in making sure that integrit was airtight in its memory
use was the leak detection offered by the Boehm garbage collection
library3. It could tell me when I was making those mistakes that
give C a bad name, such as overwriting the end of an array, failing
to free memory, dangling pointers, etc. It would tell me the exact
line of code where the problematic memory was allocated. I didn't
use the garbage-collecting features of the Boehm library, but using
the leak-detection features meant that I didn't need to!
Using C as the programming language helped keep integrit simple,
speedy, and lean.
Simultaneous Check and Update
Not all file verification systems can create a new database at
the same time they check files. Integrit has a simple conceptual
model where you have a "known" and a "current"
database. The known database contains information about what the
files were like last time you knew the system was in a trusted state.
The current database is the new database that reflects what the
files are like right now. You can update a current database at the
same time that you go through the filesystems checking files against
the known database.
That saves time, because in its report integrit produces an MD5
checksum of the new current database. If everything in the report
is okay, and the MD5 checksum still matches the new database, then
you know that it's safe for the current database to become
the known database for integrit's next run.
Cascading Rulesets
Integrit's runtime behavior is specified by a user-supplied
configuration file. It is often necessary to tell file-integrity
checking software what parts of the filesystem to check and what
parts to ignore, or to give special treatment to select files. A
set of specifications for how integrit should treat a particular
part of the filesystem is called a "ruleset".
To keep integrit simple while maintaining convenience, I added
a couple of new features, one of which was cascading rulesets. The
idea here was that a rule applied to a directory would percolate
down through all of its subdirectories and the files therein, by
default. This inheritance could be overridden by specifying rules
explicitly.
For example, to tell integrit not to verify checksums on any of
the files beneath /var/log, you could just say, "No
checksums on /var/log", like this, in the configuration
file:
/var/log S
(Capital letters turn things off, and lowercase letters turn them
on.) Integrit would then infer that /var/log/messages, /var/log/old/messages.21.gz,
etc., should not have their checksums verified. However, if you wanted
/var/log/old/hacklog to have its checksum verified, you could
say so explicitly, overriding the inherited ruleset. (Note that the
's' is lowercase, to turn on checksums):
/var/log/old/hacklog s
This cascading ruleset feature helped make integrit convenient to
use without introducing the complexity or runtime overhead of supporting
regular expressions.
SHA-1
One potentially intimidating part of Tripwire is that it forces
the user to consider and make decisions about what cryptographic
checksums to use. When I was using it, it offered nine checksum
algorithms and a free slot for the user to provide any algorithm
they wanted. So, I did some research to find out whether all this
complexity had any benefit for the user.
The answer is a definitive, "No." Read the books and
do the math, and you'll find that if you use one really good
checksum algorithm on a file, the odds are astronomically against
an attacker fooling you. Integrit uses a very secure checksum algorithm,
the Secure Hash Algorithm (SHA-1), which produces a 160-bit number.
Consider that even for the MD5 algorithm, which produces a 128-bit
checksum and has some weaknesses4, we know of no collisions, only
"pseudo-collisions". Collisions are instances where two
different sets of data produce the same checksum. Then, consider
that a cryptographically-secure 160-bit checksum algorithm is 4,294,967,296
times harder to guess than a cryptographically secure 128-bit algorithm,
and you start to get an idea of why SHA-1 alone is sufficient for
our needs.
Finding a collision would allow an attacker to replace one of
your files with a different one. As a matter of practicality, though,
no attacker would spend the enormous amount of resources necessary
to find a collision for SHA-1 in order to fool you. It would be
much cheaper to kidnap and brainwash you!
XML Output
Integrit features an option for XML output. I'm not sure
whether anyone really uses this feature, but the idea was that by
selecting XML output, the user could easily manipulate the report
into any form desired.
Easy to Configure
Integrit reads its configuration file whenever it runs. That means
that you must make sure the configuration file is safe, like the
statically linked integrit binary and the databases. Keeping it
safe means keeping it on some medium that is effectively read-only
from the host that's being checked.
In exchange for the minor inconvenience of keeping the configuration
file on a secure medium, the user gets a more easily tunable file
verification system. Changes to the configuration file take effect
immediately on integrit's next run. Therefore, the sys admin
is more likely to tune the configuration file so that the reports
get shorter and shorter, until they finally show just what the sys
admin needs to know.
Option to Reset Atime
Sometimes you need to know the last access time for a file. If
you finger a user to find out when they last read mail, the
access time for the inbox is consulted. To tell integrit to reset
the access times on all the files under your /restricted/info
directory, you can use this syntax:
/restricted/info r
and the access times will be preserved even if checksums are done
on the files.
Auxiliary Tools
You might skip this section if you haven't used any file
integrity checkers before. These auxiliary tools are likely to be
interesting only if you have ever found yourself wishing that you
could look at the information in old databases or quickly see the
SHA-1 checksum for an important file like /bin/ls.
To keep integrit simple, these conveniences were not rolled into
integrit's capabilities. Instead, a couple of supplemental
tools were created. i-viewdb is a tool for viewing integrit
databases, and i-ls is a tool for seeing all the file attributes
and the SHA-1 checksum for any given files on disk. The output is
not pretty, but it's consistent with the semantics of integrit's
output:
[ecashin@hosty ecashin]$ i-ls -s /bin/ls
/bin/ls i(97081) p(755) l(1) u(0) g(0) z(49844) a(20010429-004003) \
m(19990924-031230) c(19991105-100842) \
s(92cbce2dc869a21fc9954b48c96273899d0e1ad1)
That output shows the following attributes of /bin/ls: inode,
permissions, number of links, userid, groupid, size, access time,
modification time, change time, and SHA-1 checksum. (If you read that
last sentence closely, you have already mastered all of the one-letter
codes used by integrit, i-viewdb, and i-ls!)
[root@hosty ecashin]# i-viewdb -s /mnt/secsrv/integrit-hosty.cdb \
| grep '\/bin\/ls'
/bin/ls i(97081) p(755) l(1) u(0) g(0) z(49844) a(20010411-091153) \
m(19990924-031230) c(19991105-100842) \
s(92cbce2dc869a21fc9954b48c96273899d0e1ad1)
An Example
To make integrit more than an abstraction, let's look at
a specific example. Let's say I want to put integrit on my
desktop machine. There's a machine on the network that is only
accessible via ssh, and I want to use it as a trusted host where
the integrit binaries, configuration files, and databases will reside.
First, I go to integrit.sourceforge.net and get the latest sources.
Then I unpack the sources and build them:
ecashin@nilda ecashin$ cd build
ecashin@nilda build$ tar xfz ~/packages/integrit-1.06.06.tar.gz
ecashin@nilda build$ cd integrit-1.06/
ecashin@nilda integrit-1.06$ ./configure && make && make aux
creating cache ./config.cache
checking for gcc... gcc
checking whether the C compiler (gcc -g -Wall -O2 ) works... yes
...
gcc -L .././cdb-0.75 -static -o i-ls .././elcerror.o \
.././utilities.o .././xstradd.o .././show.o ls.o -lcrypto
make[1]: Leaving directory '/home/ecashin/build/integrit-1.06/aux'
Next I look at some example configuration files in the "examples"
directory and make one for my own system based on one of them. Make
sure that integrit never checks /proc or any NFS-mounted filesystems.
ecashin@nilda integrit-1.06$ less examples/root.conf
ecashin@nilda integrit-1.06$ cp examples/usr.conf ~/integrit-nilda.conf
ecashin@nilda integrit-1.06$ vi ~/integrit-nilda.conf
I install it on the secure database server, "meili":
ecashin@nilda integrit-1.06$ scp ~/integrit-nilda.conf root@meili:/usr/local/secdb
root@meili's password:
integrit-nilda.conf 100% |*****************************| 197 00:00
I make the secure database server viewable to the local root user
via read-only NFS:
nilda:2:ecashin integrit-1.06$ su
Password:
[root@nilda integrit-1.06]# (umask 077; mkdir /mnt/secdb)
[root@nilda integrit-1.06]# mount -t nfs meili:/usr/local/secdb /mnt/secdb
I install integrit, but I'll be using the binary that resides
on the secure database server:
[root@nilda integrit-1.06]# make install
installing documentation
cd doc && make install
make[1]: Entering directory '/home/ecashin/build/integrit-1.06/doc'
installing manpage i-ls.1 in /usr/local/man/man1
...
/usr/bin/install -c -s -m 750 integrit /usr/local/sbin/integrit
It is recommended that the binary be copied to a secure location and
recopied to /usr/local/sbin at runtime or run directly from
the secure medium.
[root@nilda integrit-1.06]# chmod 755 /usr/local/sbin/integrit
[root@nilda integrit-1.06]# scp /usr/local/sbin/integrit meili:/usr/local/secdb/bin/integrit-'uname'
root@meili's password:
integrit 100% |*****************************| 350 KB 00:00
Now I can run integrit to generate a database with the "update"
option, -u:
[root@nilda integrit-1.06]# /mnt/secdb/bin/integrit-Linux -C /mnt/secdb/integrit-nilda.conf -u
integrit: ---- integrit, version 1.06 -----------------
integrit: output : human-readable
integrit: conf file : /mnt/secdb/integrit-nilda.conf
integrit: known db : /mnt/secdb/integrit-nilda.cdb
integrit: current db : /root/db/integrit-nilda.cdb.new
integrit: root : /usr
integrit: do check : no
integrit: do update : yes
integrit: current-state db md5sum --------------
integrit: 9dd4b5f00e3036eba9e52bc815654c87 /root/databases/usr_current.cdb
And install the new database on meili, where it becomes the "known"
state database. By making a special "secdb" group or the
"no root squash" mount option we could avoid making the
database world readable. That would be a good idea, but for simplicity
we just do:
[root@nilda integrit-1.06]# chmod 644 /root/db/integrit-nilda.cdb.new
[root@nilda integrit-1.06]# scp /root/db/integrit-nilda.cdb.new meili:/usr/local/secdb/integrit-nilda.cdb
root@meili's password:
usr_current.cdb 100% |*****************************| 11460 KB 00:10
[root@nilda integrit-1.06]# openssl md5 /root/db/integrit-nilda.cdb.new
MD5(/root/db/integrit-nilda.cdb.new)= 9dd4b5f00e3036eba9e52bc815654c87
The checksum still matches the one integrit gave me, showing that
no one has modified it before I could install it.
Now I can set up a cronjob that runs integrit during off hours.
This way of using integrit is somewhat secure in that it doesn't
trust the localhost for the integrit binary, configuration file,
or database, but it has weaknesses in that it does trust some local
things like the cron system, kernel, and Sendmail. It's important
to know where the weaknesses are in the plan you choose and to settle
on something that works for you.
First we'll make some changes just so that integrit has something
interesting to report:
[root@nilda integrit-1.06]# cp /usr/local/etc/lpd.conf \
/usr/local/etc/lpd.conf.20010531
[root@nilda integrit-1.06]# echo >> /usr/local/etc/lpd.conf
And then we create a crontab for root that will run the integrit binary
from meili at 10:57PM, checking (with the -c option) nilda's
files against the known-state database we put on meili. At the same
time, the command generates a new database (with the -u for
"update" option) that reflects nilda's current state.
[root@nilda integrit-1.06]# cat > ~/.tmp
57 22 * * * (printf "Subject: integrit report\n\n"; d=/mnt/secdb; \
$d/bin/integrit-'uname' -C $d/integrit-nilda.conf -u -c) | \
/usr/lib/sendmail ecashin@users.sf.net
[root@nilda integrit-1.06]# crontab ~/.tmp
The mail I get after the cron job runs looks something like this:
From: root <root@nilda.foottoe.net>
Subject: integrit report
Date: Thu, 31 May 2001 22:57:00 -0400
integrit: ---- integrit, version 1.06 -----------------
integrit: output : human-readable
integrit: conf file : /mnt/secdb/integrit-nilda.conf
integrit: known db : /mnt/secdb/integrit-nilda.cdb
integrit: current db : /root/db/integrit-nilda.cdb.new
integrit: root : /usr
integrit: do check : yes
integrit: do update : yes
changed: /usr/local/etc m(20010526-120027:20010531-225331) \
c(20010526-120027:20010531-225331)
changed: /usr/local/etc/lpd.conf s(052f5c2071ecf64e604fa32e10fcfbdb42bb82a6:ecd4b9ddd6ec0893ef63f2f334c3087d7540307d)
changed: /usr/local/etc/lpd.conf m(20010407-155801:20010531-225333) \
c(20010407-155801:20010531-225333)
new: /usr/local/etc/lpd.conf.20010531 p(644) u(0) g(0) z(17789) \
m(2001051-225331)
integrit: checking for missing files --------------
integrit: current-state db md5sum --------------
integrit: 4b06bc1355853e12e55e222b2a2041fb /root/db/integrit-nilda.cdb.new
It looks best on a wide screen. The report shows the old and new modification
times ("m") for the /usr/local/etc directory, followed
by a changed checksum ("s"), modification time, and a new
change time ("c") for the lpd.conf file. It also
shows a new file, lpd.conf.20010531. The syntax looks ugly
at first, but after seeing it a few times, you'll find the format
is quite easy to read quickly.
The last thing integrit shows is the MD5 checksum for the new
database. The report looks OK, but before installing the new database
on the secure server, meili, I check to make sure the database hasn't
been tampered with. Again, it would be more secure if I weren't
trusting the local OpenSSL binaries.
[root@nilda integrit-1.06]# openssl md5 \
/root/db/integrit-nilda.cdb.new
MD5(/root/db/integrit-nilda.cdb.new)= 4b06bc1355853e12e55e222b2a2041fb
The checksum matches, so I make the database into the "known"
database for integrit's next run:
[root@nilda integrit-1.06]# scp /root/db/integrit-nilda.cdb.new \
meili:/usr/local/secdb/integrit-nilda.cdb
root@meili's password:
integrit-nilda.cdb.n 100% |*****************************| 11460 KB 00:11
That last process would have been more secure if I had avoided a race
condition between the time I verified that the checksum was OK and
the time I copied the database over to meili.
On nilda, though, I've settled on a level of security that
I'm comfortable with -- I am using integrit in a way that
would alert me to 99.9% of real break-ins, but I'm not using
anything like FreeBSD kernel security levels or working hard to
eliminate all instances of trusting the local system.
Needs Improvement
Integrit has met its two main goals of simplicity and conservative
memory use without sacrificing security. But, while the creeping
feature daemon must always be kept at bay, there are some areas
where integrit could benefit from development. These needs have
come to light as integrit has gained popularity and sys admins with
big farms of nearly identical machines are turning to integrit for
file integrity verification.
One important feature is needed because of a subtle problem with
the cascading rulesets. For example, if I notice in the report that
every day the modification time changes on the /var/lib directory,
I might want to keep that from showing up in the report by saying:
/var/lib M
which tells integrit not to report modification time changes on the
/var/lib directory. However, this ruleset is inherited by all
files and subdirectories in /var/lib, so that a change in the
modification time for /var/lib/important.so would not be reported.
That's not what I want.
The solution is to have a way to say that this ruleset doesn't
cascade, it only applies to this directory. Admins who have experience
with the vi editor or with sed will naturally associate
the dollar sign with "the end", so one possible syntax
for this new non-cascading ruleset feature would be this:
$ /var/lib M
Another important feature is the ability to specify database locations
on the command line instead of just in the configuration file. This
will be a major convenience to sys admins who can use the same configuration
file for many hosts but must use a different pair of databases for
each host.
Integrit is very easy to build from source, but sometimes there
are problems getting integrit statically linked against the OpenSSL
library. I have also met with a lack of response from the OpenSSL
core development team regarding the license issues associated with
the binary distribution of integrit. For these reasons, it would
be beneficial to implement my own versions of the checksums integrit
uses: MD5 and SHA-1.
Even without all these improvements, though, I enjoy using integrit.
Its output is easy to read; it is considerate to other processes
during runtime; it is something that my co-workers can easily learn
to use, and I feel more confident that my view of my systems is
accurate -- rather than just what some Trojans want me to see.
Resources
The integrit Web page -- http://integrit.sourceforge.net/
Integrit's project page on Sourceforge (includes mailing
list info) -- http://sourceforge.net/projects/integrit
1. A "checksum" is a fixed-length "signature"
generated by a mathematical operation on a variable-length sequence
of bytes. A "cryptographic checksum" is a checksum algorithm
that produces a signature for one set of data that is practically
impossible to reproduce with different data. For a good introduction
to cryptographic checksums, see Applied Cryptography, by
Bruce Schneier (John Wiley & Sons).
2. http://cr.yp.to/cdb.html
3. Hans Boehm is the author of one of the most popular and stable
garbage collectors. It can be used as a reliable memory leak detector
with a respectable number of features: http://www.hpl.hp.com/personal/Hans_Boehm/gc/
4. "Pseudo-collisions" have been found for MD5 but not
real collisions. The pseudo-collisions occur in the compression
part of the MD5 algorithm, but no collisions have been found for
full MD5 checksums, at least not by anyone who announces their results
publicly. For more information, see the "Cryptography FAQ"
at the RSA Security Web site.
Ed Cashin is a UNIX systems administrator at the University
of Georgia, working mostly with Linux and Solaris. Formerly a professional
programmer, Ed still enjoys programming in C, Perl, and even Objective-C.
His favorite things include TeX (Don Knuth's typesetting program)
and FreeBSD's amazing "ports" system for source-based
software maintenance.
|