Cover V09, I09
Article

sep2000.tar


FreeBSD’s sysctl Interface

Michael Lucas

BSD 4.4’s sysctl interface captures and sets kernel state information. This gives sys admins the ability to change the behavior of a running kernel, without a recompile or even a reboot. This ability is invaluable in systems where uptime is vital. The sysctl interface is a powerful but shadowy corner of systems administration.

In this article, I will focus on the sysctl interface as found in FreeBSD 4-stable. The interface also exists in BSD/OS, NetBSD, OpenBSD, SunOS, and any other BSD 4.4-Lite-based UNIX. You will want to check the documentation for your operating system to see which of the MIBs discussed below apply to your system.

Warning

sysctl is a very powerful tool. You can use it to send performance through the roof and save the day. You can also badly maim your system. sysctl modifies a running kernel, and if you kick a program’s legs out from under it, you won’t be happy. Test all changes on a non-production system first.

Retrieving sysctl Information

The exact sysctls available on your system, and their values, depend on your kernel configuration and any kernel modules loaded. To get a snapshot of all system sysctls, run:

#sysctl -a
This produces a lot of information; you might want to pipe this to your favorite pager or redirect it to a file.

The sysctl Management Information Bases (or MIBs) are similar to SNMP MIBs — arranged in trees, with broad root categories. Each subcategory is separated by periods. The system is deliberately extensible. On a fairly typical kernel, the root trees are as follows:

kern: Controls core kernel functions
vm: Virtual memory functions
vfs: Filesystem information
net: Network functions
debug: Kernel debugging controls
hw: Hardware descriptions
machdep: Machine-dependent variables
user: Userland interface features
p1003_1b: POSIX features

MIBs have fairly familiar values: some are integers, others are strings, some are binary. Binary MIBs are generally switches, turning particular behavior on (1) or off (0). String and integer MIBs have a variety of possible meanings, depending on their function.

Some MIBs, and some entire MIB trees, are read-only. The hw tree, for example, is hardware-dependent. These won’t change unless you can change your hardware on the fly.

You can read individual MIB values with sysctl mibname, i.e.:

# sysctl hw.usermem
hw.usermem: 148332544
#
This tells you how much physical memory remains free.

Writing sysctls

Read/write values can be set with sysctl -w mibname=value. Let’s look at some common systems administration issues, and how they can be addressed with sysctl MIBs.

IP Networking

If you’re running a FreeBSD router or firewall, enabling and disabling IP forwarding at will is useful. You can control this with the binary MIB net.inet.ip.forwarding. To disable IP forwarding, do:

# sysctl -w net.inet.ip.forwarding=0
FreeBSD’s IP stack defaults to not using RFC 1325 and RFC 1644 extensions. You can easily enable these on the fly with:

# sysctl -w net.inet.ip.rfc1325=1
# sysctl -w net.inet.ip.rfc1644=1
If you’re having problems communicating with another host, you might adjust these MIBs and see what happens. As an example, try using ftp to ftp.netscape.com with these extensions both on and off; the connection only works with the extensions off.

Another default in the GENERIC kernel is the ICMP bandwidth limitation, to rate-limit replied to bad requests. On a busy server, you might find that the default limit is too low:

# sysctl -w net.inet.icmp.icmplim=400
You might find that you need to accept source-routed packets for a particular application:

# sysctl -w net.inet.ip.accept_sourceroute=1
Some daemons leave TCP connections open for too long or fail to close them properly. This can soak up system resources or network sockets. You can either wait for these idle connections to time out, or you can enable TCP keepalives and adjust the idle connection deletion time:

sysctl -w net.inet.tcp.always_keepalive=1
sysctl -w net.inet.tcp.keepintvl=60
sysctl -w net.inet.tcp.keepinit=60
sysctl -w net.inet.tcp.keepidle=300
You will want to experiment with the exact keepalive times needed in your situation.

Even if you have various network daemons logging every connection attempt to your machine, your machine won’t log attempts to connect to ports you have nothing listening on. You can log these attempts with net.inet.tcp.log_in_vain and net.inet.udp.log_in_vain:

sysctl -w net.inet.tcp.log_in_vain=1
sysctl -w net.inet.udp.log_in_vain=1
You will get messages like this:

Oct 14 10:57:20 moneysink /kernel: \
  Connection attempt to TCP \
  209.69.69.85:8080 from 204.71.200.67:1634
This can be invaluable for diagnosing connection problems.

If you have a busy server exposed on the Internet, you probably don’t want to enable this continually unless you have a lot of logging capacity. You might also find it instructional to run for short times on such a server, however, watching the log with tail -f /var/log/messages | grep kernel.

Using log_in_vain on a machine in your corporate DMZ can also help justify increasing your security budget. Actual portscan and intrusion attempt data will convince almost anyone of the importance of a reliable firewall.

File and Process Control

Some systems, such as news servers, have a high number of open files. The FreeBSD kernel computes the maximum number of files it can open at one time from the value of the MAXUSERS kernel config variable. Other systems vary. On such a busy server, it’s quite possible that you will exceed the maximum number of open files, unless you set MAXUSERS to an insanely high value. Every system has days at the far end of the bell curve, afterall.

If you’re out of file descriptors, your /var/log/messages will start filling with messages like:

/kernel: file: table is full
The kern.maxfiles MIB contains the maximum number of open files on the system. Read the current value, and then reset it to an appropriately higher value:

# sysctl kern.maxfiles
kern.maxfiles: 1064
# sysctl -w kern.maxfiles 2128
kern.maxfiles: 1064 -> 2128
#
You can set similar resource-related limits with the MIBs:

kern.maxproc: Maximum number of system processes
kern.maxfilesperproc: Maximum number of files per process
kern.maxprocperuid: Maximum number of processes per uid

sysctl can be useful, but it might not always be the best solution. If you want to vary user resources by user ID, you might find /etc/login.conf more appropriate.

System Security

The kern.securelevel was covered in great detail in my article in the June 2000 issue of Sys Admin, so I won’t talk about it much here. Check the Sys Admin article, or man (8) init. One word of warning on this MIB, however — once you raise a system’s securelevel, you cannot reduce it without rebooting the system. The protections inherent in securelevel can interfere with some system operations. A high securelevel will prevent X from starting, for example. Do not raise kern.securelevel on a production system unless you understand exactly what it will do, or are prepared to reboot the system.

Other sysctl values can be used to enhance system security. If you don’t want users running more than X processes simultaneously, adjust kern.maxprocperuid.

Making sysctl Values Permanent

sysctl changes are only written into the running kernel, and are not saved. If you want them to be permanent, you have two choices. Some sysctl behavior can be controlled by a kernel compile. Check /sys/i386/conf/LINT for kernel options that control this behavior. In some cases, this is not desirable — for example, a high MAXUSERS value will set many system defaults very high. On a news server, all you really care about is kern.maxfiles; cranking MAXFILES is a waste of system resources.

Other sysctls, especially network-related ones, can only be set via sysctl. Some have hooks in /etc/rc.conf. Others can be set in /etc/sysctl.conf in a simple mib=value format:

# more /etc/sysctl.conf
kern.maxfiles=4000
#
Conclusion

The sysctl interface allows the administrator to control almost every aspect of a running kernel’s behavior. You can adjust every aspect of your system to your precise needs.

About the Author

Michael Lucas is a networking and FreeBSD consultant working for the Great Lakes Technologies Group. He lives in Detroit, Michigan with his wife Liz, four gerbils, and assorted fish. He can be reached at: mwlucas@exceptionet.com.