Article
Figure 1
Listing 1
Listing 2
Listing 3

may2002.tar

Tracking Machines with GetHost

John Shearer

Information is the key to maintaining efficient and secure networks. Unfortunately for most systems and network administrators, vital information is often kept in many different places and formats, and the problem is compounded by the existence of modern, heterogeneous networks. At a school with more than 1700 students, faculty, and staff (using both school-issued and privately owned computers), there was a recognized need to quickly and easily track the location of a specific machine. Previously, to track a computer or person on the network or to see whether a stolen computer had recently been used on the network, systems administrators had to check several individual sources of information by hand. I developed an application called GetHost to quickly search all these sources of information from a single, intuitive form. The GetHost database is not only accessible from an interactive search form, but is also available for an automated search from a program or script.

GetHost is a small set of scripts running on Linux, which works across several servers to create a database containing a wealth of information about people and computers using the network. The final database includes DHCP lease information, WINS information from our Windows NT Domain Controllers, email account usage, and user information from our inventory tracking system. This database can be easily searched from a Web interface. With only a small amount of information about a misbehaving (or missing) computer, an administrator can quickly tell whether that computer was on the network recently, exactly when that was, who was using it, and where they were using it.

Because all networks are different, this system may not work for everyone. However, it serves as a good example of how creativity and a little parsing can bring together foreign systems containing similar information. GetHost gathers information from the two Linux-based Internet Software Consortium (ISC) DHCP servers on our network, the WINS database from our Windows NT Domain Controller, the email usage logs from our FirstClass email server, and physical inventory information from our DKSystems Help Desk application (running on MS-SQL version 7.0).

In the next several sections, I will explain how these systems get their data to a common destination and how that information is compiled into a single, searchable database. The final database and all the raw data are stored on and parsed by a Red Hat Linux system. Linux and Perl are ideal for this task because of their flexibility and their ability to work quickly with raw text files.

The dhcpd.leases Files

Our network uses two ISC DHCP servers running on Red Hat Linux, so there are two dhcpd.leases files that must be parsed (the parsing scheme is described later). Each DHCP server uses the scp command from the SSH utilities to copy its dhcpd.leases file to the GetHost server several times a day. scp is a secure alternative to the standard rcp command. For security reasons, I recommend also using an unprivileged user to do the copying because the scp utility requires passwords to be synchronized between servers in order to run unattended. Also, during the copy process, the lease file name is changed to indicate which server it came from. The following is the cron job that runs on the Linux DHCP server to execute the bash script that does the copying:

0 * * * * /bin/su - jsmith /usr/local/bin/cp-leases

The cp-leases script runs once an hour as the user "jsmith" on each DHCP server, so every hour the two files "dhcpd.leases.one" and "dhcpd.leases.two" are updated in the /home/jsmith folder on the database server. Because the same user exists on all systems with the same password, no password needs to be given with the scp command. More information on scp can be found in the ssh man pages. Here is an example cp-leases script:

#!/bin/bash
scp /var/state/dhcp/dhcpd.leases <server>:/home/jsmith/dhcpd.leases.one

In one line, this script just securely copies the current DHCP lease file from its original location to a Linux server that will do the final parsing and manipulation.

The Inventory Database

We use the DKSystems DKHelpDesk system to maintain our physical inventory system. The database itself is on an MS-SQL server running Windows NT. Because Perl on Linux does not work well with MS-SQL, the following Perl script runs on the MS-SQL server itself to extract the needed information:

1   #!c:\perl\bin\perl.exe
2   
3   use Win32::ODBC;
4   
5   $db = new Win32::ODBC("DSN=HD_MSSQL;UID=<user>;PWD=<password>")
6          or die Win32::ODBC::Error();
7
8   if ($db->Sql("SELECT TAG_NUMBER, MAC, CLIENTNAME FROM hardware
9                 WHERE MAC > ''")) {
10      print "SQL Error: " . $db->Error() . "\n";
11      $db ->CLOSE();
12      exit;
13   }
14   
15   open FILE, '>\\\\NT-SERVER\\users\\tech\\HW.TXT'
16      or die "Could not open HW.TXT: $!\n";
17   
18   while ($db->FetchRow) {
19      my ($tag, $mac, $client) = $db->Data();
20      print FILE "$tag\t$mac\t$client\n";
21   }
22   
23   close FILE;
24   $db->Close();

Lines 1 through 3 set up the environment and bring in the Win32::ODBC module that will actually communicate with the database. Lines 5 and 6 open the connection to the database, and lines 8 through 13 set up and execute the SQL query. The "hardware" table in the inventory system contains dozens of fields, but we are only interested in the unique inventory tag number, the MAC address of the equipment, and the client to which that the equipment is assigned. The SQL query also only retrieves entries where the MAC address is actually defined ('WHERE MAC > '''). Lines 15 and 16 open the text file on the Windows NT Domain Controller that will contain the inventory data (the same server that will supply the WINS information later). Lines 18 through 21 step through the query response and populate the opened file with the tab-delimited inventory records. Finally, lines 23 and 24 clean up the open objects.

Microsoft Client Data in WINS

The WINS database (Windows Internet Naming Service) is Microsoft's way of allowing Windows machines to talk to each other on a TCP/IP network. This database stores information about a specific computer or user that cannot be found through typical TCP/IP networking avenues. Microsoft states that there are three types of information registered in the WINS database: Computer Names, Domain Names, and Other/Special Names. Specifically, we are looking for a user name (or "Other") to be attached to a specific IP address. Every time a user logs into a Windows NT domain, the WINS database is updated with information about the computer name (or the NetBIOS name), the login name, and the IP address of the computer they are logging in from.

GetHost uses a text dump of the WINS database to obtain information on active Windows users and computers. Microsoft was nice enough to supply a utility in the Windows NT Server Resource Kit that allows us to export the information from the WINS database to a text file. That utility is "winsdmp.exe". Following is an example of the "winsdump.bat" file that runs every eight hours on the Domain Controller (through the use of the scheduler service):

1   d:\NTRESKIT\winsdmp.exe 172.16.1.1 > c:\wins.txt
2
3   ftp -s:c:\winsdump.txt
4
5   rem *** Copy of 'winsdump.txt' ***
6   rem open <server>
7   rem jsmith
8   rem <password>
9   rem put c:\wins.txt
10   rem put c:\users\tech\HW.TXT
11   rem quit

Line 1 uses the "winsdmp.exe" utility with the single argument being the IP address of the server containing the WINS database. This produces text output, which is redirected to the file wins.txt.

Once the wins.txt file is created, it is ftp'd to /home/jsmith directory on the GetHost server using the same "jsmith" account that the DHCP servers use. Lines 6 through 11 show an example ftp script used by the winsdump.bat ftp command (line 3). (Note that ftp is not secured, which is another reason why we use an unprivileged account.) As long as we have the ftp pipe open, this is a perfect opportunity to transfer the HW.TXT file as well. HW.TXT is the dumped inventory text file from our Help Desk system.

Email Logs

The last piece of data for our GetHost database is the email log. We use Centrinity's FirstClass email conferencing system running on Windows NT, which keeps a comma-delimited text log of all logins and logouts from the email system. As part of this "stat" file, we can also see which IP addresses and account names were used to access the email server. This logging facility makes it very easy for Linux and Perl to parse out and know exactly who was logged in, when they were logged in, and from what machine they logged in. But, of course, it is never that easy.

One initial problem was that the email server did not like to let go of its log file while the server was online. But, it was decided that we could wait until early in the morning to grab the previous day's log file after the normal log rotation. Also, like many log files, they don't necessarily stay around very long. Ours are periodically backed up to tape and eventually deleted from the system to make room for more. This caused a problem, because we felt that a single day's worth of email logs was not sufficient for tracking purposes. If we were tracking a computer that had not been used on the network for a couple of days, then we would not get any email records for it.

We solved this problem by starting a secondary log rotation that the GetHost server had control over. After the most recent log has been retrieved from the email server, the stat.2 file is named stat.3; the stat.1 file is named stat.2; and the most recent file is named stat.1 (no matter what the original log file name is). Thus, there will always be three days of log files available for the GetHost system to draw upon. Next, the three new log files are combined into a single file to make it easier for the GetHost parser to handle. Listing 1 shows an example of how I implemented this system. I used the smbmount utility, which is part of the Samba package, to mount the Windows NT server that contains the logs from the Linux GetHost server. The file is then copied to the /home/jsmith directory and renamed as necessary. At this point, all the necessary data to construct the GetHost database has been transferred to the /home/jsmith directory on the Linux server.

The makelist.pl Script

Before I got too far compiling the specific information components for this system, however, I needed decide on the format the final database would take. I looked at SQL and hash-based databases, but finally settled on a fixed-length, flat-file database. This type of database does not lend itself to the kind of manipulation and configuration of the other more complex databases, but it is easier to code, simpler to search on all fields, viewable in your favorite text editor, and can be queried using basic command-line utilities like grep. With around 5000 records, our database is searchable in a reasonable amount of time from the Web interface. Of course the larger the database gets, the more sense it would make to use an indexed database system.

Once all the necessary information is in the /home/jsmith directory (as described in the previous sections), we can parse through it all and create the GetHost database. The makelist.pl Perl script (Listing 2) that does all the work is wrapped in the make-lease shell script, which performs several tasks:

1   #!/bin/bash
2
3   cat /home/jsmith/dhcpd.leases.one > /home/jsmith/dhcpd.leases.comp
4   cat /home/jsmith/dhcpd.leases.two >> /home/jsmith/dhcpd.leases.comp
5   /usr/local/bin/makelist.pl > /home/jsmith/gethost.list.tmp
6   mv /home/jsmith/gethost.list.tmp /home/jsmith/gethost.list

This code combines the two dhcpd.leases files into one (lines 3 and 4). Then, it kicks off the makelist.pl script and redirects its output to a temporary file (line 5). It then moves that temporary file to the final location of /home/jsmith/gethost.list (line 6). The temporary file option is not necessary, but the makelist.pl script requires a couple minutes to run and the GetHost Web page does not function properly until it's done. By using a temporary file, I can reduce the time the gethost.list file is open from a couple minutes down to the time of a file copy. Next, I will step through Listing 2, the makelist.pl Perl script, to explain how the data is parsed and added to the database.

Parsing Lease Files

The top three sections initialize the outside modules, explain the fixed-length field definitions, and set some variables. The first real work is done in Section 4 by calling the leaseparsenc() function from the Text::DHCPparse module (available at http://www.cpan.org). It gives our combined dhcpd.leases file as the only argument and assigns the returned hash reference to a local variable $list.

GetHost began as just a DHCP lease file parser. I soon realized that, although that information was extremely valuable, the parsing of the lease files was just a small piece. I also realized how little support there was for parsing DHCP lease files on the Internet. The Text::DHCPparse (see Resources) module was developed from the original GetHost system and was accepted by CPAN for inclusion in the module list.

The returned hash reference is the basis for the final GetHost database. It includes all IP addresses from the lease files, the last time the lease was renewed (which tells us the last time the computer was on the network), the MAC (or hardware) address, and the hostname if it was recorded in the lease file. Some devices do not record hostnames in the DHCP lease files, so the parsing module will pad the result with spaces to accommodate the fixed width necessary for the next section.

Next, makelist.pl calls a local subroutine to step through and parse the wins.txt file. To do this efficiently, it uses the Text::ParseWords module (standard with Perl). This module takes a field delimiter and a text string as arguments and will parse the string into a list variable. This is most useful in a comma-separated list when some (but not all) fields are quoted. This module will strip out the quotes and only leave the actual field values. The two fields we are interested in are number 1 (the computer name or logon name) and number 11 (the IP address) -- remember that list numbering starts at zero. NetBIOS names must always be 16 characters long, but the WINS database sometimes puts in a special 17th character for its own reference. Also, because NetBIOS names must be 16 characters long, there may be spaces padding the end of this field. This subroutine uses regular expressions to handle these problems and creates a new hash with the IP address as the key and the logon name or NetBIOS name as the value. Entries with duplicate IP address will append the resulting name to the proper hash key. A regular expression is used to make sure duplicate names are not appended.

Processing Email Logs

Makelist.pl next calls a subroutine to process the email logs. Our email server logs contain a lot of information about who logs in, when they log in, information about the SMTP gateway, etc. While this file is processed, we skip entries that match the regular expression '/MDM|HTTP/', because users who log in through the directly connected modems or through the server's local Web site will not have accurate IP addresses. That is, modem connections will not have associated IP addresses, and the Web clients will show the IP address of the email server itself.

We are interested in successful logins made through an Ethernet connection using the email client. We shift out the first field from the log entry and, if it equals 'Login', we take the next field (which is the login ID) and field 10 (which is the IP address). Field 10 also has a colon-separated port number that is split out. We end up creating a hash with the IP address of the client machine as the key and each successful login name from that address appended to the value. If any given login name is already associated with an IP address, it is skipped so the hash values don't get large and redundant. This will basically leave us with each email user that accessed the server from any given IP address, but not necessarily how many times that user accessed the server.

Getting MAC Addresses

The final set of data comes from the inventory table in our Help Desk system. The process to parse the inventory file is much like the others except the field delimiter is a tab. Also, we only record MAC addresses in the inventory system, not IP addresses. As long as we are using dynamically assigned IP addresses, there is no way to record that type of information anyway. Thus, the hash that is produced will use a MAC address as the key and will therefore not match the format of our base hash (from the DHCP lease files) as closely as the others. Additionally, the MAC addresses have been entered into inventory by hand, so we must account for some of the "human nature" errors that occur in that process.

To take care of the common errors when hand-entering MAC addresses, I employ a few regular expressions, as follows:

1. Replace all upper- and lowercase O's with zeros.

2. Cut out all common punctuation that could be used as delimiters (space, dash, colon).

3. For consistency, translate the string to lowercase and pad the left side with zeros if it's fewer than 12 characters in length.

This process seems to fix most of the errors in the hand-entered MAC addresses. Now that the data has been massaged a little, we are left with a hash with the MAC address of the assigned computer as the key and the name of the person to which the computer is assigned as the value. This must always be a one-to-one relationship.

Putting the Hashes Together

Now that all the data has been configured into a usable format, we can bring the various hashes together into the final GetHost database. First, the script goes through the %lease hash (generated from the DHCP lease files) and finds the recorded hardware address. (Remember that %lease is the basis for the final GetHost database, so any new information is appended to the existing %lease hash.) Second, that MAC address is matched up with the inventory hash and the user name is appended to the %lease. At this point, there is a user name associated with an IP address. If there is no associated hardware address in the inventory hash, a set number of spaces is appended instead, so the fixed-length fields stay consistent.

The email and WINS hashes are a little more difficult because there is no way to anticipate how many entries could be tied to each IP address. As an example, dozens of people could potentially use a single computer to check their email, and all those user names would be attached to one IP address making a fixed-length field nearly impossible to anticipate. One alternative would be to make the next fixed-width column wide enough to accommodate any potential string length, but this seemed inefficient. Instead we just enclose the email address field in curly brackets ({}) and enclose the WINS data in square brackets ([]) and append them to the end of the %lease hash to make up the rest of the GetHost record.

The last situation to consider is the possibility that someone used the email system from a computer with a statically assigned IP address or from a computer outside our network (from the Internet or our own dial-in service). In this case, the email hash key will not have a corresponding %lease key, because the DHCP server would not have assigned an address to this machine. So, we create a new %lease key with this new IP address and add enough spaces so the fixed length fields stay consistent. The same situation can arise in the WINS database since a WINS entry can stay around longer than a DHCP lease, or there may be WINS information from a computer with a statically assigned IP address. This is just another downside of the fixed-length field database that we live with in order to gain the benefits.

The makelist.pl Perl script could be much more streamlined and linear, but it seemed logical to make it as modular as possible. If I want to add or remove a piece to the system, it is as easy as writing a new subroutine or removing an old one. Also, small edits are much easier to perform if the chunk of code you are looking for is in a logical and separate location from unrelated code.

The last piece of the makelist.pl script is to send the %lease hash to STDOUT. Calling the make-lease shell script (above) will redirect this standard output to the final GetHost database file. Running this script from a cron job several times a day will make sure your information is current no matter how often the individual components are updated. For example, the DHCP leases are updated hourly, the WINS database and inventory are updated three times a day, and the email logs are updated daily. At this point, we could search the database with a text editor or with command-line utilities like grep. In fact, that is how the first incarnation of GetHost worked. It was logical, however, to make a more portable Web-based front-end.

Web Front-End (gethost.cgi)

Listing 3 shows a simple Web front-end running on Apache to search through the GetHost database (Figure 1). Basically, it is a single field form where you can enter any bit of information you may have on the computer or person you are looking for. That could be a MAC address (or partial MAC address), an IP address (or partial IP address for a subnet), or even a login name. Now, you can quickly determine where a person has been using his email lately and who owns the computer that has been showing up in the firewall logs. Also, if you have a highly subnetted network like we do, a search on the subnet number can tell you all the computers that have been used in a specific area recently. Because all the information you should need is now contained in a single location, the search process is much easier.

This sort of information would be invaluable to hackers since it is basically a roadmap of the entire network. We do our best to use only secure communication when possible and to limit communication when it is not. Also, with our subnetting and switching schemes, even unencrypted data is secure from all but the most sophisticated attacks. We then use Apache's built-in password protection system to further limit the use of GetHost, which is not within the scope of this article to explain. Of course, those who wish to use the GetHost system must do so in a way that complements their own security requirements and policies.

Conclusion

By design, GetHost is specifically tailored to the multiple systems running on our network. You'll probably need to adapt GetHost for your own network. The goal of this article was to show how, with a little forethought and creativity, even the most foreign systems could be brought together to share their information in a readable, searchable format. This system has saved us countless hours of searching through log files "the old-fashioned way".

Resources

Perl -- http://www.perl.com/

Text::DHCPparse -- by John D. Shearer -- http://www.cpan.org/modules/

ISC DHCP -- http://www.isc.org/

Microsoft WINS and MS-SQL -- http://www.microsoft.com/

SSH -- http://www.ssh.com/

SAMBA -- http://samba.org/

DKSystems (DKHelpDesk) -- http://www.dksystems.com/

FirstClass (email) -- http://www.centrinity.com/

John Shearer helps manage about 2000 laptops and desktops at a boarding high school in New England. Although he comes from a primarily Dos/Windows background, he has been heavily involved with Linux development for several years. He can be reached at: jshearer@nmhschool.org.