Cover V11, I13

Article

Troubleshooting SolarisTM Network Performance

Alex Golomshtok

Networks are the bloodstreams of modern computer systems. Today, nearly all computers are connected to some kind of public or private network, and it is difficult to imagine a system without at least some sort of networking capabilities. As computer technology continues to evolve, the distributed computing model gains more ground, thus increasing the importance of networks. In fact, today most organizations rely on their own complex networking structures so much that even a short period of downtime may easily translate into millions of dollars of lost revenues.

Modern-day networks are often monstrously complex, convoluted, and rely on a wide spectrum of technologies. A typical corporate network, for instance, may bring together thousands of computer systems from different hardware vendors, running various operating systems. Monitoring the health of such network is quite a challenge and may be impossible without the proper tools. To satisfy growing demands for reliable management of heterogeneous networks, the Simple Network Management Protocol (SNMP) was developed and adopted as a management standard for TCP/IP-based networking systems. SNMP quickly gained popularity and remains the primary mechanism for carrying out a multitude of network management tasks, such as network performance monitoring, fault management, configuration management, and more.

SNMP

The foundation of SNMP is the database containing the management data, on which the network management system operates. This database is commonly referred to as the Management Information Base (MIB). SNMP MIB is essentially a tree-like collection of objects, each representing a managed resource on a network. A network management system can monitor the state of these objects by reading their properties and alter the state by modifying these properties. The organization of an MIB is governed by a standard, called Structure of Management Information (SMI) [1] -- it outlines the rules for constructing and defining MIB management objects. Over the years, a few different MIBs have been developed to address various aspects of network and system management, such as Relational Database Monitoring MIB and Mail Management MIB. MIB-II [2], which defines the second version of the management information base for TCP/IP-based Internets, however, remains perhaps the most important and the most commonly used MIB specification. MIB-II defines following broad groups of management information:

  • System -- General information about the networked system, such as its identification information, location, uptime, etc.
  • Interfaces -- Information, describing each of the system's network interfaces.
  • AT -- Information pertinent to the operations of an address translation (AT) protocol; essentially the contents of the address translation table.
  • IP -- Information pertinent to the operations of IP protocol on a given system.
  • ICMP -- Information pertinent to the operations of ICMP protocol.
  • TCP -- Information pertinent to the operations of TCP protocol.
  • UDP -- Information pertinent to the operations of UDP protocol.
  • EGP -- Information pertinent to the operations of EGP protocol.
  • DOT3 -- Information pertinent to the transmission schemes and access protocols at each system interface.
  • SNMP -- Information pertinent to the operations of SNMP protocol on a given system.

Apparently, MIB-II covers many aspects of TCP/IP-based network management and allows for building comprehensive management systems. However, there is one question that MIB specifications do not quite answer -- where does the management data come from? SNMP, as powerful and flexible as it is, is just a mechanism for disseminating and sometimes altering the management information and at no time it is responsible for actually collecting and maintaining the data.

Streams

The answer lies with TCP/IP stack. Since System 5, Release 3 (SVR3), UNIX has been equipped with the streams [3] mechanism -- elegant and flexible framework for UNIX System communication services. In the true spirit of UNIX, the streams model encourages the development of compact modules, representing functional components such that these modules can then be dynamically loaded and interconnected to form a fully functional data communication path or stream. Streams closely resemble the layered structure of typical networking protocols, and therefore are perfect for implementing protocol stacks.

A stream is a communication link between the user space (or application program) and the kernel. Typically, an application will create a stream by opening a streams device, such as /dev/ip, for instance. When a stream is opened, it consists of a stream head -- the interface between the stream and the user process, and a stream driver. An application process may then "push" various modules onto the stream thus enabling certain services. Each stream module is typically responsible for carrying out a set of closely related functional tasks, such as adding network routing information to user packets. Once a stream is assembled, an application may initiate a bi-directional data exchange or stream I/O.

The data is passed through the stream in the form of messages. When a user process passes a message to the stream head, this message is sent from module to module until it reaches the bottom of the stack -- the stream driver. In this case, the message is said to be traveling downstream. Whenever the kernel replies, the data travels upstream -- each stream module passes the message to the module above it until it reaches the stream head. In case of the TCP/IP stack, not only the data but also control messages can be sent downstream to either alter the behavior of the stream or retrieve some sort of management information maintained by the stream modules.

One control message that can be sent downstream is the option management request. Typically, the option management request is delivered to a specific module on the stream, but when retrieving MIB data, all stream modules receive the request at once, and the entire universe of operational data, pertinent to all stream modules, is returned to the application program.

Under Solaris, the majority of management information, described by MIB-II, can be retrieved directly from the stream using a sequence of ioctl(2) calls. Many network monitoring utilities, such as netstat(1M) and most SNMP agents, employ streams option management requests to gather network statistics. These programs typically construct a brand new stream for the purpose of obtaining management data, configure it by pushing the appropriate modules onto the streams head, send an option management request downstream, and then extract the statistical data from the returned message.

Frankly, SNMP and a slew of Solaris network monitoring utilities solve most of the network monitoring problems while successfully hiding the complexities of streams programming from typical systems administrators. There are, however, some situations where more control or more flexibility is desired. Netstat(1M), for instance, although convenient and easy to use, remains just another program that produces textual output. This makes it difficult to use netstat as a basis for custom monitoring solutions.

Although it is certainly possible to use some shell magic to extract the values of network counters from netstat output, this approach is not very reliable, inefficient, and just plain ugly. Another problem is that every invocation of netstat results in a new OS process being created, thus consuming precious system resources. Most monitoring tools take periodic snapshots of the statistical data and calculate deltas over a predefined time interval, so a shell script that launches netstat every time it needs a sample of network statistics, makes for a very inefficient and expensive monitoring tool.

SNMP solves most of these problems -- there are numerous APIs and tools, such as the excellent Net::SNMP Perl module [4], which could be used to read and even modify the network management information in a fairly painless fashion. But even SNMP is not perfect. First, every SNMP-based tool relies on an SNMP agent, which has to be running on every managed computer system. Second, although "simple" is part of its name, SNMP is not quite that simple -- programming SNMP client applications can get fairly involved. Third, there are certain security implications when running SNMP agents on computing nodes, connected to public networks. If not configured correctly, SNMP can provide a wealth of information about your network to a potential intruder.

Solaris::MIB2

In an attempt to solve some of these problems, we created a simple yet powerful Perl extension called Solaris::MIB2 [5]. This module allows easy access to most of the statistical and operational data maintained by Solaris stream modules, while imposing only a minimal load on a monitored system. The following few lines of code demonstrate how easy it is to obtain a value of an arbitrary network counter -- for example, tcpCurrEstab or current number of TCP connections in an established state:

use Solaris::MIB2;

$mib = new Solaris::MIB2("/dev/tcp");
print $mib->{tcp}->{tcpCurrEstab}, "\n";
Apparently, all we have to do is create an instance of Solaris::MIB2 object passing "/dev/tcp" as a parameter (so that the module builds the stream over /dev/tcp device) and use the returned hash reference to read the value of interest.

As this article will show, many compact and powerful network monitors can be developed with Solaris::MIB2, although the module has a few limitations, which users should be aware of. The first, and perhaps most severe, shortcoming of Solaris::MIB2 is that it, unlike SNMP, cannot read the management data from a remote computer over the network. Although it is possible to create a custom network-based data distribution mechanism, this was not our intention. Those who look for this kind of functionality should turn to SNMP.

Yet another limitation, which is a result of a conscious design decision, is read-only access to the management data exposed by Solaris::MIB2. Unlike SNMP, which is general-purpose network management facility, Solaris::MIB2 is intended solely for the purposes of network monitoring and, as such, does not allow for any modification of the data on which it operates. While Solaris::MIB2 provides access to most of the management information, described in MIB2 RFC [2], it is not fully compliant with the specification and does not implement some of the groups, such as System, SNMP, or DOT3. In fact, the module exposes most of the structures defined in /usr/include/inet/mib2.h for the exception of some IPv6 tables.

Finally, the interface to the MIB2 data is not implemented as a tied hash -- in other words, reading a value from the MIB2 object will not trigger the option management request to be sent downstream. Instead, when the object is first created, the stream module statistics are read and loaded into the regular hierarchical hash. Every subsequent refresh operation must be initiated explicitly using the update function, which is the part of Solaris::MIB2 interface:

use Solaris::MIB2;

$mib = new Solaris::MIB2("/dev/tcp");
while( 1 ) {
sleep(5);
$mib->update();
print $mib->{tcp}->{tcpCurrEstab}, "\n";
}
As mentioned earlier, reading MIB2 statistics is all-or-none proposition -- it is impossible to retrieve the values of individual variables and, whenever an option management request is sent downstream, the operational data for all stream modules is returned. Apparently, this particular feature of MIB2 interface makes tied-hash implementation prohibitively expensive.

To demonstrate the power and flexibility of Solaris::MIB2, I've provided a few simple examples, designed to illustrate how the functionality afforded by this module can be applied to real-world network monitoring problems. The first sample program, called pif, attempts to mimic some of the functionality of the popular UNIX utility arp(1M). The arp(1M) program displays and modifies the contents of the Internet-to-Ethernet address resolution tables, used by the address resolution protocol [6]. For the sake of saving space, the capabilities of this pif program will be limited to printing the contents of the address translation or Net-to-Media table, which is an equivalent of running the arp(1M) utility with -a command-line switch.

The following is a complete source code listing of pif:

1  #!/usr/local/bin/perl
2
3  use Socket;
4  use Solaris::MIB2;
5
6  $mib = new Solaris::MIB2( "/dev/ip" );
7
8  print "Device IP Address      Mask            Flags  Phys Address\n";
9  print "------ --------------- --------------- ------ ------------------\n";
10
11  foreach my $entry ( @{$mib->{ipNetToMediaEntry}} ) {
12     my $device = $entry->{ipNetToMediaIfIndex};
13     my $host   = gethostbyaddr( inet_aton($entry->{ipNetToMediaNetAddress}),AF_INET) ||
14                     $entry->{ipNetToMediaNetAddress};
15     my $flags   =  ($entry->{ntm_flags} & Solaris::MIB2::ACE_F_PERMANENT) ? "S" : "";
16        $flags  .=  ($entry->{ntm_flags} & Solaris::MIB2::ACE_F_PUBLISH)   ? "P" : "";
17        $flags  .= !($entry->{ntm_flags} & Solaris::MIB2::ACE_F_RESOLVED)  ? "U" : "";
18        $flags  .=  ($entry->{ntm_flags} & Solaris::MIB2::ACE_F_MAPPING)   ? "M" : "";
19
20     my $mask   = sprintf("%u.%u.%u.%u",
21        map( hex("0x$_"), unpack("A2A2A2A2", $entry->{ntm_mask})));
22     my $phys   = $entry->{ipNetToMediaPhysAddress};
23
24     printf("%-6s %-15s %-15s %-6s %-20s\n", $device, $host, $mask, $flags, $phys);
25  };
The script uses two extension modules -- Solaris::MIB2, loaded at line 4; and Socket, loaded at line 3. The Socket module exposes the inet_aton function, necessary for converting the character representation of host IP addresses to struct in_addr structure, which can be consumed by gethostbyaddr(3NSL) function.

Line 6 constructs a brand new MIB2 object over /dev/ip by passing "/dev/ip" to the constructor function of Solaris::MIB2. Note that, under Solaris, read/write access to /dev/ip is limited to root and members of sys group, therefore, if not run by a privileged user, our script will fail. Always using root or other special user id to run the script is not very convenient, so making the script set-group-id sys seems like the best solution. Many UNIX programs, such as passwd(1) or netstat(1M), are set-user-id or set-group-id, which allows regular users to perform operations that are typically permitted only to root or other privileged users. Set-user-id and set-group-id programs are controversial, as many people consider them inherently unsafe. However, if configured correctly, these programs provide convenient solutions for many otherwise unsolvable problems.

Our program, however, is a script, interpreted at run time by Perl, as opposed to a binary executable such as netstat(1M), which makes it more of a security concern. First, the script's source code is more readily accessible and, thus can easily be examined for security vulnerabilities by a potential intruder. But most importantly, some UNIX kernels, especially the older ones, have a security problem with set-user-id and set-group-id scripts. When a user executes a file, where the first line starts with #!path_to_interp, the kernel translates this into an exec(2) call, invoking the interpreter, which is identified by path_to_interp and passing the original script file name and other arguments as parameters. For example, if the script "/usr/local/bin/foo" starts with #!/bin/ksh and is invoked as follows:

/usr/local/bin/foo  arg1  arg2  arg3
the kernel will actually execute the following command:

/bin/ksh  /usr/local/bin/foo  arg1  arg2  arg3
Now let's consider the following scenario: a user makes a symbolic link /tmp/foo_link pointing to /usr/local/bin/foo. In this case, the kernel executes the following command:

/bin/ksh  /tmp/foo_link  arg1  arg2  arg3
There is a window between the time the kernel opens the script file to determine what must be executed and the time when the interpreter (/bin/ksh) reopens the file to actually execute it. As small as this window might be, there is a chance that a malicious user could modify the symbolic link to point to a different file. Thus, if the script is run set-user-id root, some untrustworthy code will execute with superuser privileges.

Recent releases of Solaris close this security hole by passing "/dev/fd/3" a special file, which is already opened over the original script file, to the interpreter instead of the actual path to the script file, thus eliminating any potential race conditions and reducing the security risk. The Perl configuration script checks whether your system supports the secure set-user-id scripts using the following clever trick:

echo "#!/bin/ls" >reflect
chmod +x,u+s reflect
./reflect >flect 2>&1
if /bin/grep "/dev/fd" flect >/dev/null; then
echo "Congratulations, your kernel has secure setuid scripts!"
else
echo "setuid scripts are not secure!"
fi
If the Perl installation script detects that your system does not support secure set-user-id and set-group-id scripts, it will attempt to build a special set-user-id version of the interpreter, called suidperl. This special executable allows Perl to emulate the set-user-id mechanism, because it is invoked every time Perl detects the set-user-id or set-group-id bit set on a script file. With this in mind, we can assume that set-user-id and set-group-id Perl scripts are reasonably secure. Thus, in order for our pif script to run correctly, it should be made set-group-id 'sys' as follows:

chgrp sys pif
chmod g+s pif
Once the MIB2 object is successfully constructed over /dev/ip, the script prints out the column headings at lines 8 and 9 and then starts iterating over the contents of the Net-to-Media table, using a foreach loop at line 11. Line 12 simple reads the device name, pointed to by the ipNetToMediaIfIndex hash key. Lines 13 and 14 obtain the IP address, associated with a particular address translation table entry and attempt to look up the host name for it, using the gethostbyaddr(3NSL) function. The next four lines of code -- 15 through 18 -- check the address translation flags using a set of predefined constants exposed by the Solaris::MIB2 module.

As with the arp(1M) command, our script prints the following four flags:

  • 'S' or static as opposed to dynamic address translation entry, learned through the ARP protocol.
  • 'P' or published. This means that ARP should respond to requests for the indicated host coming from other machines. Published entries include those explicitly added with the arp(1M) '-s' command-line switch as well as the entry for the local machine.
  • 'U' or unresolved. Unresolved entries are those where ARP response has not been yet received.
  • 'M' or mapping. This is a special type, used for multicast entry 224.0.0.0.

Lines 20 and 21 read the value of the netmask for a particular address translation table entry. Solaris::MIB2 returns the netmask as a hex string -ffffff00, for instance. Our script translates the hexadecimal number into a conventional dotted notation by first breaking the string apart with unpack function, pre-pending "0x" to each of the four resulting elements to turn them into hex strings, subsequently converted to integers with hex function; and then putting everything back together with sprintf function.

Line 22 simply reads the physical or MAC address, and, finally line 24 outputs a formatted address translation entry to the screen.

When run on one of our Solaris systems, pif script produces the following output:

Device IP Address         Mask                Flags     Phys Address
------ ---------------    ---------------     ------    ------------------
hme0   sun2               255.255.255.255               08:00:20:90:c5:b6
hme0   sun5               255.255.255.255               08:00:20:81:69:c4
...
hme0   198.162.31.170     255.255.255.255               00:02:55:f4:1c:79
...
hme0   sun3               255.255.255.255     SP        08:00:20:90:cf:1c
...
hme0   224.0.0.0          240.0.0.0           SM        01:00:5e:00:00:00
which is pretty much identical to the output produced by arp -a. The next example is a bit more useful. Instead of mimicking the functionality of an existing program, it demonstrates how Solaris::MIB2 can be used to build lightweight custom network monitoring solutions. The following is a complete source code for the program, called "tcpmon":

1  #!/usr/local/bin/perl
2
3  use Solaris::MIB2 ":all";
4  use Time::HR;
5  use Getopt::Std;
6   
7  # sample thresholds
8  use constant active            => 2.0;
9  use constant retrans_problem   => 25.0;
10  use constant listen_problem   => 0.5;
11  use constant halfopen_problem => 2.0;
12  use constant outrsts_problem  => 2.0;
13  use constant attempt_fails    => 2.0;
14  use constant indup_problem    => 25.0;
15
16  getopts( "i:h" );
17  die "usage: netmon -i<interval> -h\n"
18     if $opt_h;
19
20  $mib = new Solaris::MIB2 q(/dev/tcp);
21  die "failed to create instance of MIB2 object\n"
22     unless $mib;
23
24  $now        = undef;
25  $then       = gethrtime();
26  %stats_now  = undef;
27  %stats_then = %{$mib->{tcp}}; # ensure deep copy
28
29  while(1) {
30     sleep($opt_i||5);
31     $mib->update();
32     $now       = gethrtime();
33     %stats_now = %{$mib->{tcp}};
34
35     $interval = ($now - $then) * 0.000000001;
36     next unless $interval;
37
38 $tcpInDataBytes  =
39    $stats_now{tcpInDataInorderBytes} - $stats_then{tcpInDataInorderBytes};
40 $tcpInDataBytes +=
41    $stats_now{tcpInDataUnorderBytes} - $stats_then{tcpInDataUnorderBytes};
40     $tcpInDataBytes /= $interval;
41
42 $tcpOutDataBytes =
43    ($stats_now{tcpOutDataBytes} - $stats_then{tcpOutDataBytes})/$interval;
44 $tcpRetransBytes =
45    ($stats_now{tcpRetransBytes} - $stats_then{tcpRetransBytes})/$interval;
44     $tcpRetransPercent = $tcpOutDataBytes ?
45        100.0 * $tcpRetransBytes / $tcpOutDataBytes : 0.0;
46
47     $tcpOutRsts      = ($stats_now{tcpOutRsts} - $stats_then{tcpOutRsts})/$interval;
48     $tcpAttemptFails = ($stats_now{tcpAttemptFails} - $stats_then{tcpAttemptFails})/$interval;
49
50 $tcpInDataSegs  =
51    $stats_now{tcpInDataInorderSegs} - $stats_then{tcpInDataInorderSegs};
52 $tcpInDataSegs +=
53    $stats_now{tcpInDataUnorderSegs} - $stats_then{tcpInDataUnorderSegs};
52     $tcpInDataSegs /= $interval;
54 $tcpOutDataSegs =
55    ($stats_now{tcpOutDataSegs} - $stats_then{tcpOutDataSegs})/$interval;
54   
56 $tcpActiveOpens  =
57    ($stats_now{tcpActiveOpens} - $stats_then{tcpActiveOpens})/$interval;
56 $tcpPassiveOpens =
57    ($stats_now{tcpPassiveOpens} - $stats_then{tcpPassiveOpens})/$interval;
57
58     $tcpListenDrop   = ($stats_now{tcpListenDrop} - $stats_then{tcpListenDrop})/$interval;
58 $tcpListenDropQ0 =
59    ($stats_now{tcpListenDropQ0} - $stats_then{tcpListenDropQ0})/$interval;
60 $tcpHalfOpenDrop =
61    ($stats_now{tcpHalfOpenDrop} - $stats_then{tcpHalfOpenDrop})/$interval;
61
62     $tcpInDupBytes  = $stats_now{tcpInDataDupBytes} - $stats_then{tcpInDataDupBytes};
62 $tcpInDupBytes +=
63    $stats_now{tcpInDataPartDupBytes} - $stats_then{tcpInDataPartDupBytes};
64     $tcpInDupBytes /= $interval;
65     $tcpInDupPercent = $tcpInDataBytes ?
66        100.0 * $tcpInDupBytes / $tcpInDataBytes : 0.0;
67
68     %stats_then = %stats_now; $then = $now;
69
70     print "high retransmissions, fix network.\n"
71        if $tcpRetransPercent >= retrans_problem;
72     if ($tcpListenDrop + $tcpListenDropQ0 >= listen_problem) {
73        print "Listen queue dropouts, speedup accept processing.\n";
74        print "Listen HalfOpenDrops, possible SYN denial attack.\n"
75           if $tcpHalfOpenDrop >= halfopen_problem;
76     }
77     print "Incoming connections refused: port scanner attack.\n"
78        if $tcpOutRsts >= outrsts_problem;
79     print "Attempt failures: can't connect to remote application.\n"
80        if $tcpAttemptFails >= attempt_fails;
81     print "High duplicate input, fix net and remote server retrans.\n"
82        if $tcpInDupPercent >= indup_problem;
83  };
The first few lines of the program (lines 3 through 5) load the necessary Perl extensions -- Solaris::MIB2, Time::HR, and Getopt::Std. Time::HR [7] is a very simple module that allows for measuring elapsed time intervals with nanosecond precision. The public interface of Time::HR consists of a single function, gethrtime, which under Solaris simply calls the gethrtime(3C) function. Getopt::Std is the standard Perl extension, used to process command-line arguments. The tcpmon program takes two command-line options, -h, which simply prints out the usage, and -i, which allows the user to override the default setting of 5 seconds for the sampling interval.

Lines 8 through 14 declare some thresholds, which will subsequently be used for diagnosing various network problems. Lines 16 through 18 parse command-line arguments and, in case the help flag -h is supplied, abort the program, and print the usage information on the screen. Line 20 constructs a MIB2 object over /dev/tcp. Because our script is intended for TCP monitoring, we are no longer required to construct the stream over /dev/ip, hence, there is no need to run this program set-group-id sys. Once the MIB2 object is constructed, the program records the value of the high-resolution timer at line 25 and saves the initial MIB2 statistics into a hash at line 27.

Once the initialization is completed, the program jumps into an endless loop at line 29 and suspends itself for the duration of the sampling interval -- either the value of -i command-line argument or the default 5 seconds. Upon the expiration of the interval, the MIB2 hierarchical hash is refreshed using the update function at line 31. Then, the current value of the high-resolution timer and current MIB2 statistics are recorded again at lines 32 and 33. We then calculate the elapsed time interval in seconds and restart the while loop if the elapsed time is zero.

Lines 38 through 66 perform most of the work. This is where we calculate the deltas for the TCP counters over the elapsed time interval. The algorithm for calculating these deltas is borrowed from the tcp_class.se module, distributed as a part of SE Performance Monitoring Toolkit [8].

The following measures are calculated:

  • tcpRetransPercent -- Percentage of retransmitted bytes relative to the total number of bytes transmitted over the time interval.
  • tcpListenDrop and tcpListenDropQ0 -- Number of connections dropped from the completed connection queue and incomplete connection queue, respectively.
  • tcpHalfOpenDrops -- Number of connections dropped after the initial SYN packet was received over the time interval.
  • tcpOutRsts -- Number of TCP segments sent out that contained the RST flag, over the time interval.
  • tcpAttemptFails -- Number of connections that made a direct transition to the CLOSED state from either SYN-SENT state or SYN-RCVD state, plus the number of connections that made a direct transition from SYN-RCVD state to LISTEN state over the time interval.
  • tcpInDupPercent -- Percentage of complete duplicate data segments received relative to the total number of segments received over the time interval.

Once all measures are calculated, the program saves the current TCP statistics and the value of the high-resolution timer for subsequent iterations of the while loop (line 68) and continues onto carrying out series of checks (lines 70 through 72).

The program compares the retransmission percentage against the predefined threshold value. Older releases of Solaris (prior to Solaris 2.6) had problems with TCP retransmission algorithms, thus high retransmission percentages seen on these systems may go away when all necessary TCP patches are applied. On newer systems, however, high retransmission percentage usually implies that some network hardware is faulty and dropping packets.

The next two checks are, perhaps, the most interesting and have more to do with intrusion detection than with performance monitoring. To fully understand what is going on here, one must understand how TCP establishes connections. The 3-way handshake connection establishment process [6] assumes that in order to initiate a connection, a client application will send a SYN (synchronize sequence numbers) segment, which specifies the server port number to which this client wants to connect, and the client's initial sequence number (ISN). The server then replies with a SYN/ACK packet -- the segment that contains the server's initial sequence number and the acknowledgement of the client's SYN. Next, the client acknowledges the server's SYN with another ACK segment. However, if a client attempts to connect to a port to which no service is listening, the server will reply with an RST (reset) packet.

Port Scanning

There a few different techniques that port scanners utilize to produce a list of services running on a target machine. The simplest and most basic form of TCP scanning is vanilla connect scan. This technique relies on the connect(3SOCKET) system call to open a connection to each port of interest on a target machine. If the connection succeeds, there's a service listening; otherwise, the port is unreachable. Apparently, TCP connect scan is very "loud" as most systems will log the failed connection attempts, and very inefficient, especially over slow connections.

A much better scanning technique is SYN or half-open scanning. When using this form of scan, a client will send a SYN packet just like it would do while initiating a normal connection. If the server replies with SYN/ACK, the port is in service; if RST is received, the port is unreachable. Upon receiving a reply from the server, the client immediately sends back an RST packet, thus tearing down a connection, which never goes into the established state. SYN scanning is fairly efficient and significantly less visible, as half-open connection attempts are normally not logged by the target system.

Yet another scanning technique, even more clandestine than SYN scanning, is FIN scanning. When FIN scanning, a client sends a FIN (finish sending data) packet to a server. If the RST reply is received, the port of interest is closed; however, if the FIN packet is ignored altogether, the port is listening. As we can see, regardless of the scanning technique used, the server will most likely send RST replies out if packets arrive on a closed port. Therefore, to detect a port scan in progress, all our program has to do is to check the number of RST packets sent out (tcpOutRsts) against a pre-defined threshold and report a possible port scan if this threshold is exceeded.

SYN Flooding

The next check is a bit more complex, as it attempts to detect a possible denial of service (DoS) attack -- SYN flooding. Normally, while handling incoming connection requests, TCP queues incomplete connections as well as completed connections, which have not been accepted (via the accept(3SOCKET) system call) by an application process. The maximum length of the queue is usually limited to prevent excessive consumption of system memory. Once the limit is reached, TCP will silently discard all new incoming connection requests until all pending connections are processed.

When launching a SYN-flooding attack, a client will first issue a connection request to the server by sending a packet with SYN flag set. As opposed to a normal SYN packet, however, this one will have a client IP address spoofed to be that of an unreachable host. In an attempt to complete the 3-way handshake, a server will keep trying to send a SYN/ACK packet to this unreachable host for the duration of an arbitrary timeout interval. Apparently if the attacking host sends a few of these SYN requests to a particular port on a target host (for instance, the telnet port 23), the backlog queue will fill up with pending connections to the point when the server starts dropping all new incoming connection requests. Thus, the server remains practically unusable until it finishes handling all outstanding connections on its backlog queue -- it is in effect flooded.

The tcpmon program, therefore, monitors the total number of connections dropped from the backlog queue (tcpListenDrop + tcpListenDropQ0) over a period of time, trying to determine whether the backlog limit has been reached. Backlog queue drops alone may just mean that the server accept processing is inefficient. However, when paired with excessive number of half-open connection drops (tcpHalfOpenDrop), they may be indicative of a SYN-flooding attack in progress.

Recent releases of Solaris are quite resilient to SYN flooding. Instead of just one backlog queue, Solaris systems feature two. The first one is the complete connections queue, which holds those connections for which the 3-way handshake has been completed but the accept(3SOCKET) call has not yet been issued. Second is the incomplete connections queue (or Queue 0), which holds one entry for every SYN packet that arrived. Once the server receives an ACK from the client, a connection is moved from an incomplete queue to a complete queue. The size limit value for the incomplete connection queue is typically quite large, which makes a server more resistant to SYN-flooding.

In fact, size limit values for both queues, as well as another parameter -- connection timeout (which affects the duration of time the server attempts to contact an unreachable host in our SYN-flooding scenario) -- can be further tuned to maximize the server's resistance to SYN floods. Perhaps the easiest way to view or modify the values of these parameters is via the ndd(1M) command. The following are the variable names, that ndd(1M) uses to retrieve of set the values of these tunables:

  • tcp_conn_req_max_q -- Maximum value of completed connections waiting for an accept(3SOCKET) call to finish.
  • tcp_conn_req_max_q0 -- Maximum number of connections, where 3-way handshake has not been completed.
  • tcp_time_wait_interval -- Maximum amount of time a TCP socket will remain in TIME_WAIT state.

Thus to read the value of, for example, the size limit of the completed connection queue, the following command should be executed:

ndd   /dev/tcp   tcp_conn_req_max_q
For the adventurous types, however, who want complete programmatic control over the TCP/IP tunable parameters, we created another Perl module, called Solaris::NDDI [9]. This module does essentially the same thing as ndd(1M) (although, it doesn't call ndd(1M) internally but rather utilizes some convoluted C code), and can easily be used by a regular Perl script. For instance, to read the value of the same tcp_conn_req_max_q variable, the following code should be used:

use Solaris::NDDI;

$ndd = new Solaris::NDDI ("/dev/tcp");
print $ndd->{tcp_conn_req_max_q}, "\n";
Having finished with intrusion detection checking, our tcpmon program looks at two other very simple conditions -- duplicate input percentage (which is indicative of excessive retransmissions done by remote servers) and the number of failed attempts to connect to remote applications. Obviously, this simple monitor packs a lot of useful functionality into fewer than a hundred lines of code. To ensure that the program actually does its job, we launched our favorite port scanner from a remote host as follows:

nmap   -sS   sun3
Immediately, tcpmon starts outputting the following message:

"Incoming connections refused: port scanner attack."
Although, the example programs described in this article are fairly rudimentary and lack the strength expected in a robust production application, I hope enough background information has been presented to demonstrate the simple yet powerful functionality afforded by the Solaris::MIB2 module. I also hope this article achieves its goal of stimulating the reader's appetite for building lightweight flexible custom network monitors, and that the techniques outlined here can be used to solve the some challenging network-related problems.

References

1. RFC 1155. Structure and Identification of Management Information for TCP/IP-based networks.

2. RFC 1213. Management Information Base for Network Management of TCP/IP-based Internets: MIB-II.

3. Sun Microsystems, Inc. STREAMS Programming Guide. Part Number 805-7478-10.

4. Net::SNMP by David M. Town. www.perl.com/CPAN-local, CPAN directory DTOWN, Net-SNMP-4.0.1-tar.gz

5. Solaris::MIB2 by Alexander Golomshtok. www.perl.com/CPAN-local, CPAN directory AGOLOMSH, Solaris-MIB2-0.01.tar.gz

6. TCP/IP Illustrated, Volume 1. W. Richard Stevens. Addison-Wesley Publishing Company, 1994. ISBN 0-201-63346-9.

7. Time::HR by Alexander Golomshtok. www.perl.com/CPAN-local, CPAN directory AGOLOMSH, Time-HR-0.01.tar.gz

8. SE Performance Monitoring Toolkit. Adrian Cockcroft, Richard Pettit. www.setoolkit.com.

9. Solaris::NDDI by Alexander Golomshtok. www.perl.com/CPAN-local, CPAN directory AGOLOMSH, Solaris-NDDI-0.01.tar.gz

Alexander Golomshtok is a project manager and technology specialist at JP Morgan Chase. He can be reached at: golomshtok_alexander@jpmorgan.com.