FreeBSDs sysctl Interface
Michael Lucas
BSD 4.4s sysctl interface captures and sets kernel state information. This gives sys admins the ability to change the behavior of a running kernel, without a recompile or even a reboot. This ability is invaluable in systems where uptime is vital. The sysctl interface is a powerful but shadowy corner of systems administration.
In this article, I will focus on the sysctl interface as found in FreeBSD 4-stable. The interface also exists in BSD/OS, NetBSD, OpenBSD, SunOS, and any other BSD 4.4-Lite-based UNIX. You will want to check the documentation for your operating system to see which of the MIBs discussed below apply to your system.
Warning
sysctl is a very powerful tool. You can use it to send performance through the roof and save the day. You can also badly maim your system. sysctl modifies a running kernel, and if you kick a programs legs out from under it, you wont be happy. Test all changes on a non-production system first.
Retrieving sysctl Information
The exact sysctls available on your system, and their values, depend on your kernel configuration and any kernel modules loaded. To get a snapshot of all system sysctls, run:
#sysctl -a
This produces a lot of information; you might want to pipe this to your favorite pager or redirect it to a file.
The sysctl Management Information Bases (or MIBs) are similar to SNMP MIBs arranged in trees, with broad root categories. Each subcategory is separated by periods. The system is deliberately extensible. On a fairly typical kernel, the root trees are as follows:
kern: Controls core kernel functions
vm: Virtual memory functions
vfs: Filesystem information
net: Network functions
debug: Kernel debugging controls
hw: Hardware descriptions
machdep: Machine-dependent variables
user: Userland interface features
p1003_1b: POSIX features
MIBs have fairly familiar values: some are integers, others are strings, some are binary. Binary MIBs are generally switches, turning particular behavior on (1) or off (0). String and integer MIBs have a variety of possible meanings, depending on their function.
Some MIBs, and some entire MIB trees, are read-only. The hw tree, for example, is hardware-dependent. These wont change unless you can change your hardware on the fly.
You can read individual MIB values with sysctl mibname, i.e.:
# sysctl hw.usermem
hw.usermem: 148332544
#
This tells you how much physical memory remains free.
Writing sysctls
Read/write values can be set with sysctl -w mibname=value. Lets look at some common systems administration issues, and how they can be addressed with sysctl MIBs.
IP Networking
If youre running a FreeBSD router or firewall, enabling and disabling IP forwarding at will is useful. You can control this with the binary MIB net.inet.ip.forwarding. To disable IP forwarding, do:
# sysctl -w net.inet.ip.forwarding=0
FreeBSDs IP stack defaults to not using RFC 1325 and RFC 1644 extensions. You can easily enable these on the fly with:
# sysctl -w net.inet.ip.rfc1325=1
# sysctl -w net.inet.ip.rfc1644=1
If youre having problems communicating with another host, you might adjust these MIBs and see what happens. As an example, try using ftp to ftp.netscape.com with these extensions both on and off; the connection only works with the extensions off.
Another default in the GENERIC kernel is the ICMP bandwidth limitation, to rate-limit replied to bad requests. On a busy server, you might find that the default limit is too low:
# sysctl -w net.inet.icmp.icmplim=400
You might find that you need to accept source-routed packets for a particular application:
# sysctl -w net.inet.ip.accept_sourceroute=1
Some daemons leave TCP connections open for too long or fail to close them properly. This can soak up system resources or network sockets. You can either wait for these idle connections to time out, or you can enable TCP keepalives and adjust the idle connection deletion time:
sysctl -w net.inet.tcp.always_keepalive=1
sysctl -w net.inet.tcp.keepintvl=60
sysctl -w net.inet.tcp.keepinit=60
sysctl -w net.inet.tcp.keepidle=300
You will want to experiment with the exact keepalive times needed in your situation.
Even if you have various network daemons logging every connection attempt to your machine, your machine wont log attempts to connect to ports you have nothing listening on. You can log these attempts with net.inet.tcp.log_in_vain and net.inet.udp.log_in_vain:
sysctl -w net.inet.tcp.log_in_vain=1
sysctl -w net.inet.udp.log_in_vain=1
You will get messages like this:
Oct 14 10:57:20 moneysink /kernel: \
Connection attempt to TCP \
209.69.69.85:8080 from 204.71.200.67:1634
This can be invaluable for diagnosing connection problems.
If you have a busy server exposed on the Internet, you probably dont want to enable this continually unless you have a lot of logging capacity. You might also find it instructional to run for short times on such a server, however, watching the log with tail -f /var/log/messages | grep kernel.
Using log_in_vain on a machine in your corporate DMZ can also help justify increasing your security budget. Actual portscan and intrusion attempt data will convince almost anyone of the importance of a reliable firewall.
File and Process Control
Some systems, such as news servers, have a high number of open files. The FreeBSD kernel computes the maximum number of files it can open at one time from the value of the MAXUSERS kernel config variable. Other systems vary. On such a busy server, its quite possible that you will exceed the maximum number of open files, unless you set MAXUSERS to an insanely high value. Every system has days at the far end of the bell curve, afterall.
If youre out of file descriptors, your /var/log/messages will start filling with messages like:
/kernel: file: table is full
The kern.maxfiles MIB contains the maximum number of open files on the system. Read the current value, and then reset it to an appropriately higher value:
# sysctl kern.maxfiles
kern.maxfiles: 1064
# sysctl -w kern.maxfiles 2128
kern.maxfiles: 1064 -> 2128
#
You can set similar resource-related limits with the MIBs:
kern.maxproc: Maximum number of system processes
kern.maxfilesperproc: Maximum number of files per process
kern.maxprocperuid: Maximum number of processes per uid
sysctl can be useful, but it might not always be the best solution. If you want to vary user resources by user ID, you might find /etc/login.conf more appropriate.
System Security
The kern.securelevel was covered in great detail in my article in the June 2000 issue of Sys Admin, so I wont talk about it much here. Check the Sys Admin article, or man (8) init. One word of warning on this MIB, however once you raise a systems securelevel, you cannot reduce it without rebooting the system. The protections inherent in securelevel can interfere with some system operations. A high securelevel will prevent X from starting, for example. Do not raise kern.securelevel on a production system unless you understand exactly what it will do, or are prepared to reboot the system.
Other sysctl values can be used to enhance system security. If you dont want users running more than X processes simultaneously, adjust kern.maxprocperuid.
Making sysctl Values Permanent
sysctl changes are only written into the running kernel, and are not saved. If you want them to be permanent, you have two choices. Some sysctl behavior can be controlled by a kernel compile. Check /sys/i386/conf/LINT for kernel options that control this behavior. In some cases, this is not desirable for example, a high MAXUSERS value will set many system defaults very high. On a news server, all you really care about is kern.maxfiles; cranking MAXFILES is a waste of system resources.
Other sysctls, especially network-related ones, can only be set via sysctl. Some have hooks in /etc/rc.conf. Others can be set in /etc/sysctl.conf in a simple mib=value format:
# more /etc/sysctl.conf
kern.maxfiles=4000
#
Conclusion
The sysctl interface allows the administrator to control almost every aspect of a running kernels behavior. You can adjust every aspect of your system to your precise needs.
About the Author
Michael Lucas is a networking and FreeBSD consultant working for the Great Lakes Technologies Group. He lives in Detroit, Michigan with his wife Liz, four gerbils, and assorted fish. He can be reached at: mwlucas@exceptionet.com.
|