Cover V10, I06
Article

jun2001.tar


A Complete Network Information Center

Benjamin King

This article presents a Web-based network information center that will be implemented as a CGI application to help monitor your networked hosts. This network information center will keep track of all the computers on your networks, their operating systems, and the services they offer such as http or telnet. By tracking this information in a central place, it will be easier for you to monitor your computer networks.

For example, if the Computer Emergency Response Team (http://www.cert.org) issues an alert concerning a certain http server for Windows NT, how would you determine how many of your computers are at risk of being compromised? With the aid of this application, you can easily determine how many hosts, and specifically which hosts, are vulnerable on your network. It's all done through a Web browser, which means you can access this information anywhere on your network. Once you have this information, you will be better prepared to make a decision regarding what action to take.

Background

The first step in solving any problem is to pick the right tools for the job. This job is going to require a network scanner, a database, and something to glue it all together. For the network scanner, I chose Nmap (http://url) for it's portability, ease of installation, and wide range of features. I chose MySQL (http://mysql.com) for the database because it's small, fast, gets the job done, and it also has excellent on-line documentation with numerous examples. To glue all this together, I chose the Perl scripting language for its powerful use of regular expressions and its easy integration with any database (thanks to the DBI/DBD module).

For a basic layout of how everything works together, there will be a total of three Perl scripts. The first script, called netscan.pl, will scan your network as often as you wish and store the results in the database. The second script, called webnic.pl, will reside in your Web server's cgi-bin directory and allow you to search and view the data in the database. I will explain how the databases and CGI scripts will work together. The third script, called scan_misc.pm, is a Perl module that contains some common functions that both scripts will use.

The Design

In this section, I'll discuss some of the issues I faced and how to implement all this in Perl. What is the database going to store? We first need something unique to distinguish the different hosts on the network. I used the IP address because each host on the network must have a unique address. Therefore, the IP address is going to be the primary key in our database, so no two rows in the database can have the same IP address.

Now that we can distinguish between the different hosts on the network, let's store some other useful information about them, such as their operating system and the services they offer (e.g., http on port 80). Some other useful information is the date they have been discovered on the network and the last date they were on-line. This information can be used to monitor network uptime and to observe any IP addresses no longer in use so they can be reassigned. With this kind of information, not only will you know the current state of your network, but you will be better prepared to make decisions regarding its future.

What kind of data-types will these fields be? Do we store the IP address as a string or a number? These are the key design issues that will affect how fast and efficient your database will be when searching and sorting results. An IP address can be represented as 4-byte integer or a 15-byte string. In Perl, using the Socket module and the pack function, it takes only two lines of code to convert an IP address from a string to an integer, and vise-versa.

For the other fields, I chose the operating system to be a variable-length string because some computers (i.e., old printers and routers that are connected on the network) don't have operating systems. For the services offered field (SERVICES), I also chose a variable-length string because not all hosts will be running servers. Finally, for the date discovered (DATED) and last online (LASTUP) fields, I chose the data-type "date", which is just a string in the form of YEAR-MONTH-DAY (i.e., 2000-02-01).

The following SQL code is used to create this table with the appropriate column types:

CREATE TABLE HOSTS (
  IP bigint(20) unsigned,
  HOSTNAME varchar(255),
  OS varchar(255),
  SERVICES varchar(255),
  TIMED date,
  LASTUP date,
  PRIMARY KEY (IP)
);
The IP column is an unsigned bigint for compatibility reasons, in case you're using a database other than MySQL. The number in parentheses is the number of digits allowed for a bigint. varchar is short for "variable character string", and the number in parentheses is the maximum length of that string. Notice that the IP address column is indicated as the primary key. This allows for faster sorts and linking other tables to this one for future use.

The netscan.pl script is going to be responsible for inserting and updating this table with the network data it collects using Nmap. The webnic.pl script is going to be responsible for searching and sorting the results in this table and displaying them on your Web browser using tables.

Netscan.pl

The first script in our arsenal is netscan.pl, which scans the networks looking for all the computers that are currently up or online. (All listings for this article are available from: www.sysadminmag.com.) The networks that are scanned are stored in a text file called inputf. Each line in the file is a Class C network to be scanned. Nmap has a feature that allows it to do a network-wide ping sweep. ping is commonly used to determine if a host is connected on the network. See Listing 1. Netscan.pl loops through each network in the file, first pinging all the machines to determine which hosts are currently online, and stores the results in the array @hosts. Each row in the array contains a string that has the IP address and hostname of a host that is online. Here's where Perl's muscle comes into play. I have a string like this:

Host: 13.20.216.198 (squiggly.sales.company.com)    Status: Up
and I need to extract the IP address and store it in a variable named $ip. That's a one-liner in Perl:

$str =~ m/^Host:\s([0-9.]+)\s/
Now the variable $1 holds the IP address. I just have to assign it to $ip, and it's done! The above piece of code uses Perl's matching operator to find the regular expression in the variable $str. Regular expressions are used to search for patterns. For example, the above regular expression looks for an IP address, which consists of numbers and dots, in the variable string $str. For more information, refer to the documentation link on http://www.perl.com.

Once we have an IP address, we convert it to a long integer and store it as a hexadecimal string. In other words, we call the ip_to_hex function from the scan_misc package. The reason for this is to allow clean storage in the database. With that, we search the database to see if there is already a row with this IP address. If it's already in the database, update the LASTUP field with today's date. Otherwise, insert in a new row with this IP address and LASTUP date as today.

Updating a row that already exists is very easy because we only have to update one field, LASTUP. However, if we discover a new IP address, we must insert a new row with that IP address and then scan that host to get its operating system and what services it offers, along with setting the TIMED and LASTUP fields.

The way netscan.pl handles all this, there is another loop that goes through each row in @hosts and if the IP address is already in the database, it just updates the LASTUP field. Otherwise, it inserts a new row with the IP address and LASTUP fields set. After that loop is done, the script searches the database for any rows with any empty TIMED field because if a row was just inserted, as in the above example, it will have an IP address and LASTUP, and nothing else. It loops through these results calling Nmap to run a full scan against these hosts to get their operating system and to determine the services offered. When that is done, netscan.pl updates the rest of the fields.

netscan.pl should run at least five times a day so that the information in the database stays current. The best way to do that on a UNIX server is to put the script in the root crontab because Nmap needs to run as root to determine the operating system of a host. The following shows the line that will run netscan.pl Monday to Friday, from 9 a.m. to 5 p.m. on the hour, when added to the crontab:

0 9-17 * * 1-5 netscan.pl
The first time you run this script on your network, depending on how many class C networks are in the inputf file, it will take some time to get all the online hosts information into the database. After that, it should only take a few minutes to ping scan your entire network and update the database.

Webnic.pl

The second script in our arsenal is webnic.pl, a CGI script that allows users to search and sort the results of the data in the database through a Web browser. Because this is a CGI script, there must be an HTML front end where users can enter their input. See Figure 1 and Listing 2.

The HTML front end will have the following options:

1. Operating System search (they can choose one of the following):

  • Windows 95/98/NT/2000
  • UNIX (HP-UX, SunOS, Solaris, IRIX, Ultrix, Linux, BSD)
  • Miscellaneous (MacOS, Novell, HP Printers, Routers, Switches)
  • Unidentified Operating Systems (Nmap can't detect all OSes)
  • Other (allow the user to enter a string)
2. IP address -- Can be a single IP address or a network in the form of "192.168.25.*"

3. Services (Open Ports) -- Enter a number for the port. For example, http is port number 80.

4. Number of days offline -- Specify the minimum number of days a host has not been on the network.

5. Output -- Select the output you want (IP address, hostname, operating system, time discovered, last up, number of days offline, open ports). The default is all.

6. Sort output -- Sort by IP address, hostname, or number of days offline. The default is sort by IP address.

The point of this script is to take the user-entered data on the HTML form and generate an SQL query that will return those results. This script must also convert the internal form of IP addresses to readable IP addresses when displaying the results. When calculating the number of days a host has been offline from the LASTUP date, let the database handle it because it will be able to do it faster then Perl. Most databases, including MySQL, have functions that allow you to do date and time arithmetic.

There is nothing fancy about the appearance of the output; everything is put into a table and sent to the Web browser. A nice side project would be to make the IP address link to another CGI script that would do a live Nmap scan and return the results. Larger organizations might want to link the IP address to another internal database that has more detailed information, such as who owns the computer and where it is located. It would not be hard to make these modifications to Webnic.pl.

Webnic.pl uses the CGI module that takes care of all the CGI details, so the programmer can focus on the application at hand. The script takes all the incoming data from the HTML form and assigns them to global variables that will be used throughout the script. I create two hash tables to help improve the readability of the Perl code. The first hash table, called %table_names, links the actual table column names to more readable names that make up the header of the HTML table that has the output. The second hash table, called %ostype, links operating system names to regular expression search strings that are used in searching the OS column of the database. For example, if you want to find all the UNIX machines on your network, the value of the UNIX key will be HP-UX|SunOS|Solaris|IRIX|Ultrix|Linux|BSD, the different variants of UNIX on the market today.

Generating the SQL query that will return the results from the database is where the fun begins. Basically, this script receives its input from the HTML form and then generates the SQL query to retrieve the data from the database and displays the results in an HTML table. The variable $dbquery will be the SQL query that gets sent to the database.

The general procedure I used for generating for the SQL query was, if there is no input from the HTML form (meaning that it was empty and the user just pressed the submit button) the script will dump all the data from the database. Otherwise, the script will go through each option in the form and structure the query accordingly.

The first part of the SQL "select" query is the select expression, which determines what output comes from the database. When the user specifies the desired output (e.g., only show the hostname and number of days offline, option #4 above), the commify() function from the package scan_misc (Listing 3) is used. commify() takes an array of strings and returns a single string with all the elements enumerated and with commas between them. This is used because the user can select multiple outputs and the CGI module stores them in an array. From that array, they must be enumerated in the SQL query with commas separating the different columns.

The following SQL code will determine the number of days a host has been offline:

TO_DAYS(NOW())-TO_DAYS(LASTUP) as OFF
The number stored in OFF is the number of days the host has been offline. This statement is included in the select expression. The second part of the SQL "select" query is the where-definition, where the search criteria is placed.

The operating system and services fields are strings, therefore I use a regular expression search for those fields. The MySQL database has a function called regexp() that allows a regular expression search on a column. Because only one operating system type can be specified, the %ostype hash table holds the regular expression used to search the OS column. Otherwise, if the operating system specified is "Other", then the user-entered regular expression is used to search the OS column. For the services field, the number entered is searched in the SERVICES column.

The IP address option allows the user to enter a single IP address of a certain computer or network address that will search only the computers on that network. If nothing is entered, all the hosts are searched according to the other options on the form. The "number of days offline" option accepts a positive integer that is the number of days the hosts have been offline.

The third part of the SQL "select" query is the "order by" definition, which sorts the results of the search. The sort option goes at the end of the select query and sorts all the rows before returning them to the script. There are three sort options and they are all column names. Thus, the "order by" statement only needs a column name to sort the results. Recall that the default sort is by IP address because it keeps everything neat and makes the output easier to browse.

The last part of this script is returning the results from the database in a tabular format. This is accomplished by using the HTML table tag. The results are returned from the database, row by row, and a while loop is used to create each row in the HTML table.

Conclusion

These two scripts are trivial in what they do and are designed to be very modular, portable, and upgradeable. The CGI module allows for easy debugging on the command line, thus eliminating a lot of headaches trying to debug it with a Web browser. These scripts are not designed to be one-size-fits-all. They were designed to provide a model that you should modify to meet your networking and administrative needs.

Once you have an understanding of how these two scripts work, you can modify them to fit your specific needs. If you're concerned about IP addresses or running out of them, use these scripts to help you monitor inactive IP addresses on your networks. You can write another short CGI script that compares the IP address in the database to the ones in your DNS files to make sure that nothing that could be used is being wasted. If you are more concerned about what services your hosts offer, you can monitor all of them easily from any computer on the network. By "knowing" your network, you are in a better position to keep it secure and make informed decisions regarding it.

Benjamin King is currently a research assistant for the Broadband and Wireless Networking Laboratory at the Georgia Center for Advanced Telecommunications Technology (GCATT). He wrote this article when he was a student assistant for the Computer Support Group for the School of Electrical and Computer Engineering at the Georgia Institute of Technology. He can be reached at: king@ece.gatech.edu.