Network Management with Overlapping IP Address Ranges

Scott Kirkwood

Every UNIX systems administrator has some knowledge of networks and routing, and most have basic experience with network devices. For example, configuring NFS, DNS, or NIS are all common network configuration tasks for a sys admin, but the network infrastructure does not typically require more than an Ethernet cable, an IP address, and an entry in the DNS server for each new Unix system. There are, however, certain implementations in which the network infrastructure must be designed as an integral component of the systems architecture. This article will detail just such a scenario, and will show how expanding your knowledge of network infrastructure configuration can make the difference between a dead project and a functional architecture.

The Scenario

Service providers have always been necessary in any enterprise WAN, supplying the telecommunications circuits and WAN bandwidth needed to operate a business network. However, the advances in transmission technology along with increased competition in the telecom sector have resulted in gluts of bandwidth available to enterprise clients for ever-decreasing costs. A service provider can no longer sell simple T1, T3, and OC3 service to companies and expect to stay in business.

Most service providers have looked deeper into the enterprise network for additional services to sell, and network management is an obvious target. The result is managed network services from telecommunications carriers. The enterprise client benefits by reducing its need for expensive WAN savvy employees, and gains the peace of mind that the service provider has a cutting-edge NMS system monitoring its WAN 24x7. That’s how the marketing brochures read, and it sounds simple — until a UNIX administrator actually tries to implement the Network Management Systems. This case study outlines the technological pitfalls that are inherent in such a systems design and the network solutions that can make it all work the way the marketing brochures advertise.

The Problem

The problem of managing hundreds of separate networks from a single NMS point appears to be one of scale and security. The most likely issue seems to be getting a system with enough power to monitor hundreds of thousands of network devices in real time. However, a bit of research reveals a greater problem to the configuration that cannot be overcome with more processing power or extra memory.

Most enterprise companies are not in possession of large blocks of private IP addresses, and have implemented RFC 1918 public IP addressing (e.g., 10.0.0.0/8 or 192.168.0.0/16) on their internal corporate network. This solution, along with Network Address Translation or proxy service for Internet access works fine for a single enterprise company. However, the service provider supplying NMS service may have hundreds of companies with overlapping IP address ranges, and hundreds of devices to manage with identical IP addresses.

This results in a fundamental problem for the systems administrator, and one that cannot be overcome with additional software or hardware. The issue is actually external to the UNIX system, but will impact the systems design if not properly corrected. Neither the network architect, nor the systems architect can individually provide a solution to this problem. It takes a combination of the two, and the UNIX administrator will likely need to drive the effort to completion.

The problem is not one of network management, because most modern NMS software actually references devices in its database by the hostname collected during SNMP polling. The NMS software does not care whether there are hundreds of devices with identical IP addresses, because the devices are being indexed in a database by other means. The issue here is one of basic IP routing, as depicted by Figure 1.

The basic questions being asked in this configuration are: How do you get network packets to find the proper destination when there may be 200 devices with the same IP address, and how do the response packets from these devices find their way back to the NMS system?

Network Address Translation (NAT) would be the logical solution, but the nature of the Simple Network Management Protocol (SNMP) makes that solution impossible. SNMP carries the IP addresses of managed devices as payload data within its packets, and NAT cannot effectively perform payload translation for SNMP. At the time of this implementation, there were a few proprietary solutions in existence that could perform this task, but none were available in a commercially viable product.

So the task is this: Design a UNIX system and network architecture that will allow NMS software to perform SNMP monitoring of hundreds of networks with overlapping IP addresses, and correlate all the information into a single display for centralized support. The following requirements were made of the design, because of corporate hardware standards and platform specifications of the NMS Software:

Sun hardware for UNIX systems
Solaris 8 operating system
Cisco routers and switches
Smarts In-Charge NMS software
Complete separation of customer network traffic for security

The Solution

This problem can be solved neither in the network, nor in the systems configuration alone. It requires a combination of complex routing techniques within the network infrastructure, along with a corresponding configuration in the UNIX system. As in that overused business mantra, you have to think outside the box (the UNIX box, that is).

Conquering this problem involves taking the overlapping IP addresses out of the general mix. The solution is possible due mainly to the multi-tiered architecture of the NMS software being deployed. Most modern NMS software allows for a multi-tiered installation with multiple SNMP collection systems forwarding information to a central correlation and display engine. An example of this type of installation is shown in Figure 2.

In such a deployment, the Tier-1 NMS systems perform active SNMP polling of managed devices and systems within a given area of the network. This includes periodic polling of set parameters, such as interface status and performance metrics, as well as the receipt of asynchronous SNMP traps sent by the devices whenever a fault or preset event occurs. The Tier-2 NMS system consolidates information from all of the Tier-1 systems for event correlation and eventual display in a single management console.

This design is intended to allow for a more scalable implementation of NMS software by distributing polling and correlation duties across multiple systems, as well as reducing SNMP traffic on WAN links. However, when combined with an appropriate network configuration, this design can be modified to create a workaround for the overlapping IP address issue.

In the overlapping IP addresses, the multi-tiered NMS architecture described above could allow a local polling station to perform active SNMP polling of a particular client network while passing events back to the central reporting and display station. Events passed back to the central station identify managed devices by fully qualified domain names, thus eliminating IP addresses from the relationship. This means that the customer networks can be monitored with SNMP using local IP addresses and any event notifications forwarded to the central monitoring station are communicating hostname information rather than IP addresses.

In short, each network can have its own NMS system for local monitoring, while a centralized system consolidates and displays the information to a centralized support group within the managed services provider. The only drawback of this solution is that every single customer network to be monitored requires its own dedicated SNMP polling station as depicted in Figure 3.

Although the solution is workable, it is far from scalable and the cost of hundreds of customer monitoring stations (to fulfill the requirements of this particular case) quickly outweighs any potential profit in the business case for such a setup. Thus, it is necessary to reduce the number of polling stations in the design to attain a viable solution.

SNMP Monitoring

The specific SNMP monitoring product selected for this implementation provided a portion of the solution through its basic design. This product, known as SMARTS In-Charge, allows the running of multiple instances of its collection engine on a single UNIX system. Each instance requires its own virtual IP interface from which to send and receive SNMP requests. Each virtual interface is assigned a unique IP address, which forms the basis for routing packets despite overlapping IP addresses. Most NMS software, including several freeware packages, have followed this model to allow for multiple instances of software on a single system, so the solution is reasonably software independent, assuming this requirement is met.

Fulfilling the requirement for a virtual interface for each instance of the NMS software is nothing new to most systems administrators. Admins have been using virtual interfaces, or multiple IP addresses per interface, for many years. It is a common configuration for Web servers and shared LAN scenarios. However, simply using additional IP addresses per physical interface would not fulfill the requirements of this design for security. Since each customer requires complete separation of network traffic, it is necessary to keep all traffic segregated by customer until it reaches the NMS software on the system. Within the UNIX system itself, customer data can be secured by running each instance of NMS software as its own UID and by using standard UNIX security to prevent information leaks. However, on the network, the data cannot share a common LAN or security could be compromised.

VLANs

Implementing a combination of policy-based routing and Virtual LANs (VLAN) within the network architecture (and extending this into the polling systems by means of 802.1q-compliant network interface cards) handles the requirement of an individual interface for each instance of SMARTS In-Charge, while keeping customer networks completely segregated for security reasons.

VLAN technology allows multiple Ethernet networks to be carried across a single cable with true frame-level separation. It requires a network interface capable of supporting the 802.1q standard as well as connection to an 802.1q-compliant switch. Configuration of this component of the UNIX system is described later in the article. The result is similar to that pictured in Figure 4.

In this scenario, policy-based routing is implemented in the access router, which allows the network architect to pre-define a set of routing policies based on source interface or IP addresses. IP Datagrams sent by the monitoring systems are inspected at the access router for their source interface. Policies are defined within the router, which correlate a source interface (from a given instance of the NMS software) to an interface on the router that leads to a particular customer network. Thus, two packets with identical destination IP addresses can be routed to the proper outgoing interface based on pre-defined parameters. Figure 5 shows the policy defined within the access router.

The problem of outgoing packets sent to duplicate IP addresses is now solved. Response packets, or packets sent by the monitored device back to the Tier-1 NMS system are not a problem because they will have a destination IP address for their particular instance of the NMS software. Thus, the access routers can simply forward these packets back to the proper virtual interface based on standard IP routing and existing VLANs within the network architecture. The same principle applies to SNMP trap packets that are occasionally sent by the devices.

The resulting solution requires a combination of systems and network configurations to circumvent the overlapping IP address issue. Despite the apparent complexity of the design, the implementation is straightforward, and requires very little specialized network knowledge. However, given the correlation between virtual interface, router interfaces, and customer networks, a strict change management process is essential to avoid configuration and security issues once the architecture is implemented.

The design is capable of supporting hundreds of overlapping IP addresses within a single NMS system. The only limiting component of the architecture is the maximum of 64 VLANs per interface on a Sun system. Based on loading estimates and managed node counts, it was determined that a maximum of 20 client networks could be managed by a single Sun E420R system. This particular case required an initial installation of 14 Tier-1 NMS systems to handle the customer load. Two Tier-2 NMS systems (Sun Enterprise 4500) were installed in a high-availability configuration to handle event correlation. Additionally, 4 display servers (Sun Netra T1) were installed for accessing the network information from the operations centers via a Web interface. Figure 6 shows the final systems design and layout.

The design called for a centralized storage system using a Fibre Channel-based Storage Area Network (SAN), so no NFS configuration was necessary. For centralized administration, the systems were configured in an NIS+ domain with NIS+ access specifically denied from the customer networks. This last configuration note, along with packet filtering at the access routers (deny all except SNMP and SNMP-Trap) on each customer-facing interface could have removed the need for expensive firewalls in the design. However, this design ultimately did include firewalls between the access routers and NMS systems for additional inspection and logging of network traffic.

Configuring the Systems

The first requirement of the UNIX systems in this configuration is to ensure that the systems being used include network interfaces capable of supporting VLANs and the 802.1q standards. When using 802.1q VLANs, every frame transmitted on an interface has a four-byte “tag” added to the header to identify the VLAN to which the frame corresponds. Support for the tagging of frames must be contained within the driver for the network interface as these additional four bytes can push an Ethernet frame beyond the standard maximum frame length, thereby causing the interface to report an error on VLAN-tagged frames.

In this scenario, the E420R systems were equipped with standard Sun Fast Ethernet adapters, which do not support 802.1q tagging. Because our solution required VLANs only on the customer network facing side of the Tier-1 NMS systems, the standard Fast Ethernet interfaces were used for connection to the internal Tier-2 NMS LAN. The Tier-1 systems each had a Sun Gigabit Ethernet adapter with 802.1q support installed for connection to the customer access network. Currently, most Sun Gigabit Ethernet adapters are 802.1q capable.

Configuration of the VLANs on the Solaris systems is surprisingly straightforward. It involves creating a file named /etc/hostname.vge<n> for each VLAN interface. This is a flat text file that contains a hostname for that VLAN interface. An IP address for this hostname must be present in the /etc/hosts file. The 802.1q standard supports up to 4092 VLANs per interface. However, Sun adapters are currently limited to 64 VLANs per interface. The number of VLANs on a system can be increased beyond 64 by using multiple Gigabit Ethernet adapters. In our scenario, the limiting factor is the number of NMS instances that the system can support, which is far less than the 64-VLAN limit for a single interface. Numbering of the /etc/hostname.vge<n> files is shown in Figure 7.

VLAN tag numbers must then be assigned to each of the VLAN interfaces configured. The 64 VLANs that can be configured on a Sun adapter can correspond to any of the 4092 VLANs that are possible in 802.1q. Therefore, it is necessary to specifically assign the VLAN tag numbers to each VLAN interface configured. This is done by creating files named /etc/vlan.vge<n> for each VLAN interface. The numbering scheme for these files is identical to the hostname.vge files. Each /etc/vlan.vge<n> file must contain the VLAN tag number in decimal, octal, or hexadecimal format with no additional characters, lines, or spaces.

Once the /etc/hostname.vge<n> and /etc/vlan.vge<n> files are created, a simple reboot of the system will activate the VLAN configuration. It is important to note that on most hardware, configuring even one VLAN turns on VLAN tagging for that entire physical interface. At that point, the system must be connected to an 802.1q-compliant device with tagging enabled in order to communicate on the network. Connecting a system using VLAN tagging to a standard network hub or switch port will result in a complete inability to communicate with the network because all packets leaving the physical interface will have a four-byte VLAN header that can only be interpreted by an 802.1q-compliant device.

Network configuration of the NMS software requires the hostname and IP address of a valid VLAN interface to be entered into the startup configuration files for each instance of the NMS software. Every NMS package has different configuration specifications for this information, depending on the software installation instructions. Configuration simply requires that the IP address, VLAN tag and NMS instance correspond correctly to the customer network to be monitored, and that this information correspond to the policy-based routing configuration in the access routers.

Configuring the Network Devices

There are two components of the network devices that must be configured to match this system configuration: the Ethernet switches and the customer access routers. The switches used in this scenario were Cisco Catalyst 5000 series switches running CatOS 6.1(1) software, with blades for Gigabit and Fast Ethernet interfaces installed. Configuration of these switches involved simply establishing the VLANs within the switch and enabling trunking on the port to which the Tier-1 NMS servers attached. Listing 1 shows a sample configuration showing the VLAN configuration and trunking enabled on the Tier-1 NMS ports. Note that this is not a complete configuration file for the switch as it shows only the portions related to VLAN configuration.

This configuration excerpt shows configuration of the VLANs that correspond to VLANs established on the Solaris systems. The “vtp mode transparent” command refers to VLAN trunking between switches and simply indicates that no database-defined VLANs are in use within the architecture. Each of the “set vlan” commands establishes a VLAN and its corresponding tag number and VLAN name. Note that, in this example, Cisco switches utilize VLAN tag number 1 for a default VLAN within the switch. It is therefore not possible to assign VLAN 1 for the purposes described in this case study. A common workaround to this situation is to start the VLAN numbering at VLAN 100 so that VLAN tag numbers can correspond more directly with customer numbers.

The “set trunk” command enables VLAN trunking on a particular switch port; in this case, Gigabit Ethernet ports 1 and 2 in module 3 are enabled to use 802.1q trunking for connection to Tier-1 NMS systems or the access routers. Cisco also allows the use of its proprietary ISL (Inter-Switch link) trunking for VLAN trunk ports, but this is not compliant with 802.1q and will not communicate properly with the Sun adapters.

Configuration of the access routers is significantly more complex because it contains the policy-based routing that circumvents the duplicate IP address issue. Listing 2 shows a sample configuration for a customer access router.

Although this configuration file is quite complex, it is still only a scaled-down version of the full configuration presented to demonstrate the policy-based routing technique used for this case study. Complete configuration requires significant experience configuring these devices. Two different components comprise the routing used here.

Each of the incoming VLAN interfaces references a policy route-map against which all packets arriving on that interface are applied. Each customer will have one VLAN interface (denoted by the GigabitEthernet0/0.<n> definitions) and one corresponding route-map reference. This ensures that all packets arriving on a VLAN interface circumvent the IP routing engine and are forwarded according to the defined policy. The route maps referenced simply point all packets to the correct outgoing interface for the customer.

Packets arriving on the customer network interfaces (serial interfaces) can be routed using standard IP routing tables because the Tier-1 NMS systems each have unique addresses. The IP routing table maintained by the router will point these packets directly to the proper VLAN interface without policy intervention.

Conclusion

The example presented here is specific to the network management scenario discussed, but the solution can be applied to numerous other situations that a systems administrator may face. One of the most common applications for this type of network configuration is for vendor or third-party access to internal corporate systems. This typically involves a huge outlay for firewall equipment, NAT engines, and dedicated vendor servers to access corporate application systems such as inventory databases or customer information databases.

In such a scenario, the systems administrator faces the same problem seen here: access to a single system from multiple networks that may include overlapping IP address ranges. NAT is a useful tool, but not always a perfect solution to this scenario. Through careful application of a network configuration in cooperation with the systems configuration, these problems can be overcome in a scalable and affordable manner. When faced with a complex network application issue such as this, extend your administration and architecture knowledge beyond the UNIX box and into the network to find your solution.

Scott Kirkwood specializes in designing and implementing network and systems operations centers for enterprise and service provider companies. His previous positions have included: Network Architect, Unix Engineer, IT Business Analyst and Jazz Musician. All of these skills (except the Jazz Musician) are combined in his current role as a business consultant to the IT industry. He currently works for International Network Services, and can be reached at: scott.kirkwood@ins.com.