Tracking
Machines with GetHost
John Shearer
Information is the key to maintaining efficient and secure networks.
Unfortunately for most systems and network administrators, vital
information is often kept in many different places and formats,
and the problem is compounded by the existence of modern, heterogeneous
networks. At a school with more than 1700 students, faculty, and
staff (using both school-issued and privately owned computers),
there was a recognized need to quickly and easily track the location
of a specific machine. Previously, to track a computer or person
on the network or to see whether a stolen computer had recently
been used on the network, systems administrators had to check several
individual sources of information by hand. I developed an application
called GetHost to quickly search all these sources of information
from a single, intuitive form. The GetHost database is not only
accessible from an interactive search form, but is also available
for an automated search from a program or script.
GetHost is a small set of scripts running on Linux, which works
across several servers to create a database containing a wealth
of information about people and computers using the network. The
final database includes DHCP lease information, WINS information
from our Windows NT Domain Controllers, email account usage, and
user information from our inventory tracking system. This database
can be easily searched from a Web interface. With only a small amount
of information about a misbehaving (or missing) computer, an administrator
can quickly tell whether that computer was on the network recently,
exactly when that was, who was using it, and where they were using
it.
Because all networks are different, this system may not work for
everyone. However, it serves as a good example of how creativity
and a little parsing can bring together foreign systems containing
similar information. GetHost gathers information from the two Linux-based
Internet Software Consortium (ISC) DHCP servers on our network,
the WINS database from our Windows NT Domain Controller, the email
usage logs from our FirstClass email server, and physical inventory
information from our DKSystems Help Desk application (running on
MS-SQL version 7.0).
In the next several sections, I will explain how these systems
get their data to a common destination and how that information
is compiled into a single, searchable database. The final database
and all the raw data are stored on and parsed by a Red Hat Linux
system. Linux and Perl are ideal for this task because of their
flexibility and their ability to work quickly with raw text files.
The dhcpd.leases Files
Our network uses two ISC DHCP servers running on Red Hat Linux,
so there are two dhcpd.leases files that must be parsed (the parsing
scheme is described later). Each DHCP server uses the scp command
from the SSH utilities to copy its dhcpd.leases file to the GetHost
server several times a day. scp is a secure alternative to the standard
rcp command. For security reasons, I recommend also using an unprivileged
user to do the copying because the scp utility requires passwords
to be synchronized between servers in order to run unattended. Also,
during the copy process, the lease file name is changed to indicate
which server it came from. The following is the cron job that runs
on the Linux DHCP server to execute the bash script that does the
copying:
0 * * * * /bin/su - jsmith /usr/local/bin/cp-leases
The cp-leases script runs once an hour as the user "jsmith"
on each DHCP server, so every hour the two files "dhcpd.leases.one"
and "dhcpd.leases.two" are updated in the /home/jsmith
folder on the database server. Because the same user exists on all
systems with the same password, no password needs to be given with
the scp command. More information on scp can be found in the ssh man
pages. Here is an example cp-leases script:
#!/bin/bash
scp /var/state/dhcp/dhcpd.leases <server>:/home/jsmith/dhcpd.leases.one
In one line, this script just securely copies the current DHCP lease
file from its original location to a Linux server that will do the
final parsing and manipulation.
The Inventory Database
We use the DKSystems DKHelpDesk system to maintain our physical
inventory system. The database itself is on an MS-SQL server running
Windows NT. Because Perl on Linux does not work well with MS-SQL,
the following Perl script runs on the MS-SQL server itself to extract
the needed information:
1 #!c:\perl\bin\perl.exe
2
3 use Win32::ODBC;
4
5 $db = new Win32::ODBC("DSN=HD_MSSQL;UID=<user>;PWD=<password>")
6 or die Win32::ODBC::Error();
7
8 if ($db->Sql("SELECT TAG_NUMBER, MAC, CLIENTNAME FROM hardware
9 WHERE MAC > ''")) {
10 print "SQL Error: " . $db->Error() . "\n";
11 $db ->CLOSE();
12 exit;
13 }
14
15 open FILE, '>\\\\NT-SERVER\\users\\tech\\HW.TXT'
16 or die "Could not open HW.TXT: $!\n";
17
18 while ($db->FetchRow) {
19 my ($tag, $mac, $client) = $db->Data();
20 print FILE "$tag\t$mac\t$client\n";
21 }
22
23 close FILE;
24 $db->Close();
Lines 1 through 3 set up the environment and bring in the Win32::ODBC
module that will actually communicate with the database. Lines 5 and
6 open the connection to the database, and lines 8 through 13 set
up and execute the SQL query. The "hardware" table in the
inventory system contains dozens of fields, but we are only interested
in the unique inventory tag number, the MAC address of the equipment,
and the client to which that the equipment is assigned. The SQL query
also only retrieves entries where the MAC address is actually defined
('WHERE MAC > '''). Lines 15 and 16 open the text file on
the Windows NT Domain Controller that will contain the inventory data
(the same server that will supply the WINS information later). Lines
18 through 21 step through the query response and populate the opened
file with the tab-delimited inventory records. Finally, lines 23 and
24 clean up the open objects.
Microsoft Client Data in WINS
The WINS database (Windows Internet Naming Service) is Microsoft's
way of allowing Windows machines to talk to each other on a TCP/IP
network. This database stores information about a specific computer
or user that cannot be found through typical TCP/IP networking avenues.
Microsoft states that there are three types of information registered
in the WINS database: Computer Names, Domain Names, and Other/Special
Names. Specifically, we are looking for a user name (or "Other")
to be attached to a specific IP address. Every time a user logs
into a Windows NT domain, the WINS database is updated with information
about the computer name (or the NetBIOS name), the login name, and
the IP address of the computer they are logging in from.
GetHost uses a text dump of the WINS database to obtain information
on active Windows users and computers. Microsoft was nice enough
to supply a utility in the Windows NT Server Resource Kit that allows
us to export the information from the WINS database to a text file.
That utility is "winsdmp.exe". Following is an example
of the "winsdump.bat" file that runs every eight hours
on the Domain Controller (through the use of the scheduler service):
1 d:\NTRESKIT\winsdmp.exe 172.16.1.1 > c:\wins.txt
2
3 ftp -s:c:\winsdump.txt
4
5 rem *** Copy of 'winsdump.txt' ***
6 rem open <server>
7 rem jsmith
8 rem <password>
9 rem put c:\wins.txt
10 rem put c:\users\tech\HW.TXT
11 rem quit
Line 1 uses the "winsdmp.exe" utility with the single argument
being the IP address of the server containing the WINS database. This
produces text output, which is redirected to the file wins.txt.
Once the wins.txt file is created, it is ftp'd to /home/jsmith
directory on the GetHost server using the same "jsmith"
account that the DHCP servers use. Lines 6 through 11 show an example
ftp script used by the winsdump.bat ftp command (line 3).
(Note that ftp is not secured, which is another reason why we use
an unprivileged account.) As long as we have the ftp pipe open,
this is a perfect opportunity to transfer the HW.TXT file
as well. HW.TXT is the dumped inventory text file from our
Help Desk system.
Email Logs
The last piece of data for our GetHost database is the email log.
We use Centrinity's FirstClass email conferencing system running
on Windows NT, which keeps a comma-delimited text log of all logins
and logouts from the email system. As part of this "stat"
file, we can also see which IP addresses and account names were
used to access the email server. This logging facility makes it
very easy for Linux and Perl to parse out and know exactly who was
logged in, when they were logged in, and from what machine they
logged in. But, of course, it is never that easy.
One initial problem was that the email server did not like to
let go of its log file while the server was online. But, it was
decided that we could wait until early in the morning to grab the
previous day's log file after the normal log rotation. Also,
like many log files, they don't necessarily stay around very
long. Ours are periodically backed up to tape and eventually deleted
from the system to make room for more. This caused a problem, because
we felt that a single day's worth of email logs was not sufficient
for tracking purposes. If we were tracking a computer that had not
been used on the network for a couple of days, then we would not
get any email records for it.
We solved this problem by starting a secondary log rotation that
the GetHost server had control over. After the most recent log has
been retrieved from the email server, the stat.2 file is
named stat.3; the stat.1 file is named stat.2;
and the most recent file is named stat.1 (no matter what
the original log file name is). Thus, there will always be three
days of log files available for the GetHost system to draw upon.
Next, the three new log files are combined into a single file to
make it easier for the GetHost parser to handle. Listing 1 shows
an example of how I implemented this system. I used the smbmount
utility, which is part of the Samba package, to mount the Windows
NT server that contains the logs from the Linux GetHost server.
The file is then copied to the /home/jsmith directory and
renamed as necessary. At this point, all the necessary data to construct
the GetHost database has been transferred to the /home/jsmith
directory on the Linux server.
The makelist.pl Script
Before I got too far compiling the specific information components
for this system, however, I needed decide on the format the final
database would take. I looked at SQL and hash-based databases, but
finally settled on a fixed-length, flat-file database. This type
of database does not lend itself to the kind of manipulation and
configuration of the other more complex databases, but it is easier
to code, simpler to search on all fields, viewable in your favorite
text editor, and can be queried using basic command-line utilities
like grep. With around 5000 records, our database is searchable
in a reasonable amount of time from the Web interface. Of course
the larger the database gets, the more sense it would make to use
an indexed database system.
Once all the necessary information is in the /home/jsmith
directory (as described in the previous sections), we can parse
through it all and create the GetHost database. The makelist.pl
Perl script (Listing 2) that does all the work is wrapped in the
make-lease shell script, which performs several tasks:
1 #!/bin/bash
2
3 cat /home/jsmith/dhcpd.leases.one > /home/jsmith/dhcpd.leases.comp
4 cat /home/jsmith/dhcpd.leases.two >> /home/jsmith/dhcpd.leases.comp
5 /usr/local/bin/makelist.pl > /home/jsmith/gethost.list.tmp
6 mv /home/jsmith/gethost.list.tmp /home/jsmith/gethost.list
This code combines the two dhcpd.leases files into one (lines 3 and
4). Then, it kicks off the makelist.pl script and redirects its output
to a temporary file (line 5). It then moves that temporary file to
the final location of /home/jsmith/gethost.list (line 6). The
temporary file option is not necessary, but the makelist.pl script
requires a couple minutes to run and the GetHost Web page does not
function properly until it's done. By using a temporary file,
I can reduce the time the gethost.list file is open from a
couple minutes down to the time of a file copy. Next, I will step
through Listing 2, the makelist.pl Perl script, to explain how the
data is parsed and added to the database.
Parsing Lease Files
The top three sections initialize the outside modules, explain
the fixed-length field definitions, and set some variables. The
first real work is done in Section 4 by calling the leaseparsenc()
function from the Text::DHCPparse module (available at http://www.cpan.org).
It gives our combined dhcpd.leases file as the only argument and
assigns the returned hash reference to a local variable $list.
GetHost began as just a DHCP lease file parser. I soon realized
that, although that information was extremely valuable, the parsing
of the lease files was just a small piece. I also realized how little
support there was for parsing DHCP lease files on the Internet.
The Text::DHCPparse (see Resources) module was developed from the
original GetHost system and was accepted by CPAN for inclusion in
the module list.
The returned hash reference is the basis for the final GetHost
database. It includes all IP addresses from the lease files, the
last time the lease was renewed (which tells us the last time the
computer was on the network), the MAC (or hardware) address, and
the hostname if it was recorded in the lease file. Some devices
do not record hostnames in the DHCP lease files, so the parsing
module will pad the result with spaces to accommodate the fixed
width necessary for the next section.
Next, makelist.pl calls a local subroutine to step through and
parse the wins.txt file. To do this efficiently, it uses the Text::ParseWords
module (standard with Perl). This module takes a field delimiter
and a text string as arguments and will parse the string into a
list variable. This is most useful in a comma-separated list when
some (but not all) fields are quoted. This module will strip out
the quotes and only leave the actual field values. The two fields
we are interested in are number 1 (the computer name or logon name)
and number 11 (the IP address) -- remember that list numbering
starts at zero. NetBIOS names must always be 16 characters long,
but the WINS database sometimes puts in a special 17th character
for its own reference. Also, because NetBIOS names must be 16 characters
long, there may be spaces padding the end of this field. This subroutine
uses regular expressions to handle these problems and creates a
new hash with the IP address as the key and the logon name or NetBIOS
name as the value. Entries with duplicate IP address will append
the resulting name to the proper hash key. A regular expression
is used to make sure duplicate names are not appended.
Processing Email Logs
Makelist.pl next calls a subroutine to process the email logs.
Our email server logs contain a lot of information about who logs
in, when they log in, information about the SMTP gateway, etc. While
this file is processed, we skip entries that match the regular expression
'/MDM|HTTP/', because users who log in through the directly
connected modems or through the server's local Web site will
not have accurate IP addresses. That is, modem connections will
not have associated IP addresses, and the Web clients will show
the IP address of the email server itself.
We are interested in successful logins made through an Ethernet
connection using the email client. We shift out the first field
from the log entry and, if it equals 'Login', we take
the next field (which is the login ID) and field 10 (which is the
IP address). Field 10 also has a colon-separated port number that
is split out. We end up creating a hash with the IP address of the
client machine as the key and each successful login name from that
address appended to the value. If any given login name is already
associated with an IP address, it is skipped so the hash values
don't get large and redundant. This will basically leave us
with each email user that accessed the server from any given IP
address, but not necessarily how many times that user accessed the
server.
Getting MAC Addresses
The final set of data comes from the inventory table in our Help
Desk system. The process to parse the inventory file is much like
the others except the field delimiter is a tab. Also, we only record
MAC addresses in the inventory system, not IP addresses. As long
as we are using dynamically assigned IP addresses, there is no way
to record that type of information anyway. Thus, the hash that is
produced will use a MAC address as the key and will therefore not
match the format of our base hash (from the DHCP lease files) as
closely as the others. Additionally, the MAC addresses have been
entered into inventory by hand, so we must account for some of the
"human nature" errors that occur in that process.
To take care of the common errors when hand-entering MAC addresses,
I employ a few regular expressions, as follows:
1. Replace all upper- and lowercase O's with zeros.
2. Cut out all common punctuation that could be used as delimiters
(space, dash, colon).
3. For consistency, translate the string to lowercase and pad
the left side with zeros if it's fewer than 12 characters in
length.
This process seems to fix most of the errors in the hand-entered
MAC addresses. Now that the data has been massaged a little, we
are left with a hash with the MAC address of the assigned computer
as the key and the name of the person to which the computer is assigned
as the value. This must always be a one-to-one relationship.
Putting the Hashes Together
Now that all the data has been configured into a usable format,
we can bring the various hashes together into the final GetHost
database. First, the script goes through the %lease hash (generated
from the DHCP lease files) and finds the recorded hardware address.
(Remember that %lease is the basis for the final GetHost database,
so any new information is appended to the existing %lease hash.)
Second, that MAC address is matched up with the inventory hash and
the user name is appended to the %lease. At this point, there is
a user name associated with an IP address. If there is no associated
hardware address in the inventory hash, a set number of spaces is
appended instead, so the fixed-length fields stay consistent.
The email and WINS hashes are a little more difficult because
there is no way to anticipate how many entries could be tied to
each IP address. As an example, dozens of people could potentially
use a single computer to check their email, and all those user names
would be attached to one IP address making a fixed-length field
nearly impossible to anticipate. One alternative would be to make
the next fixed-width column wide enough to accommodate any potential
string length, but this seemed inefficient. Instead we just enclose
the email address field in curly brackets ({}) and enclose the WINS
data in square brackets ([]) and append them to the end of the %lease
hash to make up the rest of the GetHost record.
The last situation to consider is the possibility that someone
used the email system from a computer with a statically assigned
IP address or from a computer outside our network (from the Internet
or our own dial-in service). In this case, the email hash key will
not have a corresponding %lease key, because the DHCP server would
not have assigned an address to this machine. So, we create a new
%lease key with this new IP address and add enough spaces so the
fixed length fields stay consistent. The same situation can arise
in the WINS database since a WINS entry can stay around longer than
a DHCP lease, or there may be WINS information from a computer with
a statically assigned IP address. This is just another downside
of the fixed-length field database that we live with in order to
gain the benefits.
The makelist.pl Perl script could be much more streamlined and
linear, but it seemed logical to make it as modular as possible.
If I want to add or remove a piece to the system, it is as easy
as writing a new subroutine or removing an old one. Also, small
edits are much easier to perform if the chunk of code you are looking
for is in a logical and separate location from unrelated code.
The last piece of the makelist.pl script is to send the %lease
hash to STDOUT. Calling the make-lease shell script (above) will
redirect this standard output to the final GetHost database file.
Running this script from a cron job several times a day will make
sure your information is current no matter how often the individual
components are updated. For example, the DHCP leases are updated
hourly, the WINS database and inventory are updated three times
a day, and the email logs are updated daily. At this point, we could
search the database with a text editor or with command-line utilities
like grep. In fact, that is how the first incarnation of GetHost
worked. It was logical, however, to make a more portable Web-based
front-end.
Web Front-End (gethost.cgi)
Listing 3 shows a simple Web front-end running on Apache to search
through the GetHost database (Figure 1). Basically, it is a single
field form where you can enter any bit of information you may have
on the computer or person you are looking for. That could be a MAC
address (or partial MAC address), an IP address (or partial IP address
for a subnet), or even a login name. Now, you can quickly determine
where a person has been using his email lately and who owns the
computer that has been showing up in the firewall logs. Also, if
you have a highly subnetted network like we do, a search on the
subnet number can tell you all the computers that have been used
in a specific area recently. Because all the information you should
need is now contained in a single location, the search process is
much easier.
This sort of information would be invaluable to hackers since
it is basically a roadmap of the entire network. We do our best
to use only secure communication when possible and to limit communication
when it is not. Also, with our subnetting and switching schemes,
even unencrypted data is secure from all but the most sophisticated
attacks. We then use Apache's built-in password protection
system to further limit the use of GetHost, which is not within
the scope of this article to explain. Of course, those who wish
to use the GetHost system must do so in a way that complements their
own security requirements and policies.
Conclusion
By design, GetHost is specifically tailored to the multiple systems
running on our network. You'll probably need to adapt GetHost
for your own network. The goal of this article was to show how,
with a little forethought and creativity, even the most foreign
systems could be brought together to share their information in
a readable, searchable format. This system has saved us countless
hours of searching through log files "the old-fashioned way".
Resources
Perl -- http://www.perl.com/
Text::DHCPparse -- by John D. Shearer -- http://www.cpan.org/modules/
ISC DHCP -- http://www.isc.org/
Microsoft WINS and MS-SQL -- http://www.microsoft.com/
SSH -- http://www.ssh.com/
SAMBA -- http://samba.org/
DKSystems (DKHelpDesk) -- http://www.dksystems.com/
FirstClass (email) -- http://www.centrinity.com/
John Shearer helps manage about 2000 laptops and desktops at
a boarding high school in New England. Although he comes from a
primarily Dos/Windows background, he has been heavily involved with
Linux development for several years. He can be reached at: jshearer@nmhschool.org.
|