Writing Linux Device Drivers
Arthur Donkers
One of the early design advantages of UNIX subsequently copied by other operating systems, such as MS-DOS, was the manner in which UNIX treats devices. UNIX essentially abstracts devices out of the kernel proper and deals with them as files through code called device drivers. The device drivers needed for a particular system are then linked back into the kernel, either dynamically in some UNIX implementations or statically at kernel build time. Due to the fact that device drivers need to deal with complex, low-level hardware issues such as interrupts and synchronization, device driver code was traditionally written by highly experienced C programmers who worked for organizations that had access to UNIX source code. Thus, the ability to write a UNIX device driver was once considered a measure of a UNIX programmer's guru-hood.
However, things have changed. Since the rise of Linux many more people have the oppportunity to play with the source code of an inexpensive version of Unix. A primary benefit of Linux has been, and hopefully will continue to be, the availability of the source code. The source code provides numerous examples of how to write your own device driver for that gizmo you bought at your local surplus store.
This article offers pointers for writing your own device driver. However, there is no generic way of doing this, as the functionality of each device dictates the requirements of the associated driver.
Tip #1: Study examples of drivers written by others for similar devices. Solutions for one device can often be adapted to other purposes.
In the most basic sense, device drivers are those pieces of Unix code that provide the connection between the operating system, applications, and the hardware (the device itself). Thus, device drivers have to, among other things, deal with the asynchronous character of the hardware. This means that the hardware can, and in most cases will, generate events that are completely unrelated to the current state of the kernel. The kernel is therefore interrupted to do something on behalf of the device. An example of this can be a driver for a hard disk that will interrupt the kernel once a block of data has been transferred from the disk to the memory of the machine.
For the kernel to handle all of these different devices, a generic driver interface is available. All device drivers must conform to this generic interface to be able to work. This interface enables a process to open and close a device, read data from it, write data to it, query its status, and set special operating conditions. Not all of these functions need to be implemented for any given device - sometimes the nature of the device excludes one or more of these functions.
Generally speaking, a device driver can be considered a dedicated piece of software that will translate the device-specific interface into a generic interface that can be used by the Unix kernel and various applications. Figure 1 should make this a bit more clear.
One important thing to remember is that a device driver usually operates on a group of similar devices, such as SCSI tape drives, not on one particular brand and model of that device. The serial ports of a PC are another example. Normally, there are two or more of these ports available. Linux has only one device driver for the serial port, and that driver can distinguish for which of the available ports it is currently working. How the device driver makes this distinction is one of the topics I will discuss in the following paragraphs.
Block and Character Devices
I mentioned the generic kernel interface for device drivers previously. Note, however, that there are so many different devices available that not all can used by the same interface. Therefore, the devices have been divided into two groups:
block devices
character devices
This division is based on how a device processes its data, either as a block (usually 512 bytes in length) or as a stream of bytes, one character at a time.
An example of a block device is a hard disk. Although the data comes out of, or goes into, the disk itself as a stream of bytes, for the computer these data are transferred in blocks (i.e., of 512 bytes). This last feature is accomplished by using a technique called DMA (Direct Memory Access). This means that the CPU is not involved directly in transferring the data, but the disk controller itself stores the blocks of data in, or reads them directly from the memory. After the transfer has been completed, the disk controller will interrupt the CPU to indicate this completion. Therefore, it is called a block-based device.
Normally, these block transfers are used with high-speed devices that need to transfer large amounts of data. In this way, all of these transfers can be done with a minimum of overhead and CPU utilization.
All other devices that do not transfer data in blocks are called character devices. These devices often have a much lower transfer rate than the block devices, and they typically do not support DMA. However, they tend to use interrupts more frequently than do block devices. Under normal circumstances, a character device will interrupt the CPU for each byte of data is has prepared or needs. Therefore a character device tends to generate more CPU overhead than a block device. This is just a rule of thumb, there are, of course, exceptions.
If you want to know which of your devices is a block device and which is a character device, you can call up a directory listing of the /dev directory. An example is shown below.
brw-rw---- 1 root disk 3, 0 Sep 7 1994 hda
brw-rw---- 1 root disk 3, 1 Sep 7 1994 hda1
brw-rw---- 1 root disk 3, 10 Sep 7 1994 hda10
brw-rw---- 1 root disk 3, 11 Sep 7 1994 hda11
brw-rw---- 1 root disk 3, 12 Sep 7 1994 hda12
brw-rw---- 1 root disk 3, 13 Sep 7 1994 hda13
brw-rw---- 1 root disk 3, 14 Sep 7 1994 hda14
brw-rw---- 1 root disk 3, 15 Sep 7 1994 hda15
brw-rw---- 1 root disk 3, 16 Aug 2 22:05 hda16
crw-rw---- 1 root daemon 6, 0 Jan 1 1980 lp0
crw-rw---- 1 root daemon 6, 1 Jan 1 1980 lp1
crw-rw---- 1 root daemon 6, 2 Jan 1 1980 lp2
This is the output of an ls -l command in the /dev directory. If you look at the protection bits of each special device file, you will see either a b or a c. A b means a block device (in this case, the first (E)IDE hard disk, /dev/hda, and a number of partitions on this hard disk), and a c means a character device (in this example, a number of parallel printer ports).
Major and Minor Device Numbers
The numbers in the center columns of the listing (e.g., 3, 0 and 6, 0) represent major and minor device numbers. As mentioned in one of the previous paragraphs, a device driver normally controls all similar devices in one computer. The question is, how does the kernel know which type of device driver to use? The kernel does that by looking at the major number of the device when it is opened. This major number is the first number of the number pair that you see when you do an ls -l in the /dev directory. In the previous example, these major numbers would be 3 for the hda device and 6 for the lp device. By means of this number, the kernel can distinguish the different general types of devices and use the appropriate driver.
The second number of these pairs is called the minor number. This number is used to distinguish the different instances of devices within the same class. You can see this if you look at the serial devices in your computer:
crw-r--r-- 1 root root 4, 64 Jan 1 1980 ttyS0
crw-r--r-- 1 root root 4, 65 May 29 1995 ttyS1
crw-r--r-- 1 root root 4, 66 Jan 1 1980 ttyS2
crw-r--r-- 1 root root 4, 67 Jan 1 1980 ttyS3
All of these devices have major number 4, which in Linux means a dial-in serial port. Within this same group, the different serial ports each have their own minor number. In this case, ttyS0 (which is COM1: in DOS-speak) has minor number 64.
Loadable or Static?
One of the features of Linux (and several other Unix implementations) is "loadable modules." In a classic Unix kernel, all devices must be configured into the kernel. Once that has been done, the kernel can be relinked and all specified devices are included in the standalone kernel program. In such a system, all device code is always present in memory, whether you actually need it or not.
To save on the memory footprint of the kernel, Linux uses the concept of loadable modules. These are special object files that can be added to the kernel at runtime. When they are no longer needed, they can be unloaded to free the occupied memory. In this way, the kernel normally uses less memory than when all device drivers are statically linked. Another advantage is that when a new device is added to the computer only the new module needs to be loaded without recompiling the complete kernel.
When you are a device driver developer, loadable modules have an advantage too. You do not need to relink the kernel each time you change your device driver code. This can save you valuable time. However, do not think that by using a loadable module you will prevent your system from hanging when your driver code fails.
Tip #2: Whenever you have the choice, build your device driver as a loadable module.
Note that a loadable module "belongs" to one specific kernel version. This involves the way the module resolves the links to the internal kernel functions it uses. Whenever the internals of the kernel change, so does the list of kernel symbols and thus a loadable module of version x.y.w will probably not work with kernel version x.y.z. To protect yourself from freezing your kernel when loading a non-matching module, you can compile the kernel with the option:
Set version information on all symbols for modules
(CONFIG_MODVERSIONS) [Y/n/?]
Those who would like to learn more about the internals of a Unix kernel should read Maurice J. Bach's book, The Design of the UNIX Operating System.
An Example
Here is an example, with all of its intrinsics, of a real Linux device driver. The driver shown has not been written by me, but it serves as a good example. All device driver code for Linux can be found in the Linux source tree, under the subdirectory drivers. If you take a look at this directory you will see:
-rw-r--r-- 1 root root 1210 Apr 22 1996 Makefile
drwxr-xr-x 2 root root 1024 Oct 21 13:58 block
drwxr-xr-x 2 root root 1024 Oct 21 13:55 cdrom
drwxr-xr-x 3 root root 2048 Oct 21 13:59 char
drwxr-xr-x 5 root root 1024 Oct 21 13:55 isdn
drwxr-xr-x 2 root root 3072 Oct 21 13:59 net
drwxr-xr-x 2 root root 1024 Oct 21 13:59 pci
drwxr-xr-x 3 root root 1024 Oct 21 13:54 sbus
drwxr-xr-x 2 root root 3072 Oct 21 13:54 scsi
drwxr-sr-x 3 root root 2048 Oct 21 13:54 sound
There is no strict distinction between block and character devices although most do adhere to this distinction. I will use the driver for a Cyclades Muli serial board as an example. This serial board is a dedicated piece of hardware with 8 serial lines on board and its own I/O processor. It uses just one interrupt and a piece of shared memory as a buffer. The driver can be statically linked into the kernel or built as a loadable module.
A lot of the code in the device driver is used for autodetecting the different settings for I/O address, shared memory address, and IRQ number. This probing is done by trying out a number of predefined values. I will get back to this later.
By the way, all code referred to in this article can be found in kernel version 2.0.18 as included in the Red Hat 4 distribution.
The Header of the File
The code for this driver can be found in the char subdirectory of the drivers directory in the Linux source tree. The first part of this source file contains a long comment about the changes made to the code during its development. You can read it to get an idea of how this code developed.
The next part is the list of files that are included. One of these files is called module.h. You will need to include this file if you want your driver to be built as a loadable module.
The special macro small_delay needs a bit of explanation. Under normal circumstances you must be very careful in your device driver. You cannot disable interrupt for too long or your machine will become unresponsive or miss data. You cannot enable something like the shell's sleep function. This will completely block your machine and turn it into interPASSIVE mode instead of interACTIVE mode. So, if you need a small delay (or the order of a few hundred microseconds), you need to do a busy wait by using a delay loop.
Interrupts and Stuff
When writing a device driver, the most difficult parts are often the interrupt handlers. The remarks about avoiding excessively long delays are especially true in connection with interrupt handlers. When processing an interrupt, the CPU will mask off the lower priority interrupts. Thus, as long it is handling that interrupt, other interrupts remain blocked.
To prevent loss of data from a serial interrupt when processing a hard disk interrupt, Linux uses a special trick for the serial driver. A serial driver is split up into a "bottom" half and a "top" half. Both halves communicate using a special task queue and a scheduler.
The top half will handle all of the serial interrupts and try to process them as quickly as possible. The top half therefore builds a special request block that it will send to the task queue. Once it has done that, it will return.
The task request will then be scheduled by the scheduler for the bottom half of the serial driver. This bottom half does not run as an interrupt service routine, but will run with all interrupts enabled. It can therefore take a bit more time for processing without blocking out the rest of the devices and interrupts.
The Module Side of Things
When the driver is built as a module, two special routines are enabled that are called whenever a module is loaded or unloaded. The names of these modules are:
init_module(void)
cleanup_module(void)
The init_module routine is called when the module is loaded. This routine will initialize the driver internals and the hardware of the board(s) that have been autodetected. It does this by calling the routine cy_init. If the driver is statically linked into the kernel, the cy_init routine will be called when the system is started.
The cleanup_module routine is called when the module is unloaded from memory. It will unregister the serial devices, both the dial-in devices and the callout devices. Unregistering means the routine will tell the kernel that the major and minor numbers associated with the device are no longer serviced. The kernel can then update its table of active major numbers. After that, all references to that major number will generate a runtime error.
Once the module has successfully unregistered itself, it will free the interrupt number(s) it has allocated. These interrupts can then be used by another module.
Initializing the Driver
Now, let's look at the cy_init routine for this driver. If you read the comment, you will see that the minor number associated with this device is allocated at loadtime. The cy_init code will probe the cards it has found and give each port on a card a minor number.
The main function of this cy_init routine is to fill out two special structures, called cy_serial_driver and cy_callout_driver. Both of these structures are of type tty_driver, which is defined in the file /usr/include/linux/tty_driver.h. It is a generic tty driver structure that contains pointers to the different entry points of a serial I/O driver. If you look at the tty_driver.h file you will see in the comments what each entry point means.
Listing 1 is an extract of the include file with the most important entry points. It is good to keep this information at hand when examining the code in the driver.
From the code in cy_init, you will see the driver fills in its own major number (field cy_serial_driver.major). This number is a compile time constant and must be unique among all other configured drivers. Even better, this number should be unique among all known Linux drivers. The code in cy_init will also state the starting minor number of the devices it will handle (field cy_serial_driver.minor_start).
The next few fields are serial driver specific. These fields specify how many serial ports the driver will handle and the initial settings of the ports.
The next item is the list of entry points that are used by the kernel whenever it needs to communicate with the driver. The open and close routines are used to open and close a particular port. The ioctl routine is used to set device specific options. The write routine is used to send data to the serial device. Note that a read routine is missing from this list. How the kernel reads characters from a serial device will be explained later.
Once the cy_init routine has filled the cy_serial_driver structure, it will copy the data to the cy_callout_driver structure. In this callout structure, the routine will change the major number and subtype of the serial port. The reason for this is simple. Linux recognizes two kinds of (virtual) serial devices:
dial-in
dial-out (callout)
The difference here has to do with the way a serial device handles the Carrier Detect (CD) signal. A callout (or dial-out) port does not need to have an active Carrier Detect signal to be opened. A dial-in device, however, needs an active CD signal to be able to be opened. If CD is not active, the open call will block until it has become active. A getty process blocks during the open call until the modem raises its CD signal to indicate that it has made a connection with another modem. As soon as CD is raised, the getty process will spawn a login shell.
The callout device is normally used for outgoing serial connections (e.g., for PPP or UUCP). For an outgoing connection, the CD is normally inactive until a connection is established with a modem at the remote end. As soon as both structures are filled, both drivers (actually one driver with two major numbers) are registered. From that moment on, the kernel makes the two major numbers active and all operations on a device with one of these two major numbers are routed to this driver.
After this init phase, the driver determines the I/O address, shared memory address, and interrupt number for each board. The driver will first detect ISA boards in the function cy_detect_isa(). This routine will return the number of ISA boards it has found in the system. The function cy_detect_pci() will do the same for the PCI boards. After detecting the boards, the driver will fill in the appropriate internal structures, and set all unused data to invalid.
Autodetecting
One of Linux's strong points is its ability to autodetect a wide array of different hardware. It will probe, and under normal circumstances, return the I/O address, IRQ number, and other information about a board. Next, I'll discuss how this works with the example driver for ISA boards.
The driver attempts to detect, at a number of predefined addresses, whether a board is present at that particular address. The driver checks for a particular "signature" (i.e., bit or byte pattern) that is supposed to be unique for the hardware.
In this example, that means the driver will probe a number of predefined shared memory addresses. For each of these addresses, the driver will try to initialize the serial board that is supposed to be located there. When this initialization fails, obviously there is no board present at that location. The initialization is done by the routine cy_init_card(cy_isa_address,0). The driver will try to reset the controller chip at a particular address and see if it returns an expected result. If the expected result is not returned, the driver will return a 0 (zero) indicating that it has not found any controller. If however the expected result is found, the driver will update the number of boards it has found. Once all addresses have been probed, it will return the total number of boards found.
If the driver encountered a board at the address tested, it will then try to autodetect the IRQ number that the board is using. This is done in the do_auto_irq routine. The driver will allocate all IRQs it expects the boards to use, then enable these interrupts and send a command to the serial controller on the board to generate an interrupt. The interrupt service routine that is triggered will write its number in a variable called cy_irq_triggered. This variable is then read by the autoprobe routine to detect the IRQ number. As the comment says, this process is not foolproof, but it works most of the time. After detecting the interrupt, all IRQs are deallocated and the routine returns. This technique is not only used by this driver, but by various other drivers as well.
After detecting the real IRQ number, the interrupt is allocated by the request_irq kernel routine. When allocating, the IRQ service routine, cy_interrupt, is specified as well. This routine will be called each time an interrupt occurs on the allocated IRQ number.
Note that there are hidden dangers when autodetecting hardware. Under some circumstances trying to detect one piece of hardware can seriously confuse other hardware. This is particularly true when a lot of different hardware uses the same address space. This can happen with some network adapters, SCSI cards, and sound cards in one system. Thus, while autodetect works normally under most conditions, it is not a failsafe mechanism.
The Interrupt Service Routine
When an interrupt occurs, the interrupt service routine is called. In this case, the routine is cy_interrupt. An interrupt service routine is a very dedicated piece of software. It often reads hardware registers for status information and does I/O to the hardware for reading and writing data. There is no generic way of writing an interrupt service routine. All information in this paragraph should be considered an example.
First, the routine will get the information from the kernel IRQ table to see whether it was a spurious interrupt or not. When the interrupt is real, the service routine will scan all serial chips on the board to see if they have work to do. The routine will then see whether the interrupt was generated for receiving data or sending data. It can do this by checking the status of each serial controller on the card.
If there is input data present, it will first check for errors, and other problems that may have occurred during reception. If there were no errors, the data are stored in the input buffer of the tty structure. This is a generic kernel structure, which explains the missing read function from the initialization routine. All data are directly read by the kernel - not through the device driver itself. The reason for this has to do with the fact that a serial line may change its line discipline to PPP. If so, the kernel needs direct access to the data buffer as it contains PPP packets that need to be handed over to the network layer.
If the interrupt occurred because an output buffer was empty, the interrupt service routine tries to fill the output buffer again and send the data away.
Another source of interrupt has to do with the modem control signals. If the routine detects a change on one of the relevant modem signals, it will schedule a request for the bottom half of the driver. This bottom half will then handle that request. Such a request might be caused by the hardware flow control of the modem telling the serial board to stop sending data for a while so it can flush its buffers. This bottom-half processing is done in the routine do_softint(). The different modem-related events are handled here.
The Rest
I've covered most of the driver code for the Cyclades board, but there are a number of routines I have not mentioned. These are generally used for very specific things like communicating with the hardware on the board and similar functions. Such functions are so board specific that they have no real value as examples.
Conclusions
Writing a device driver is still not a trivial task. It requires an in-depth understanding of both the hardware involved and the internals of Linux. However, one advantage to Linux is that a large number of drivers are available in source form in the kernel tree. This can serve as a very good starting point for your own device driver.
About the Author
Arthur Donkers graduated from the Delft University of Technology with a degree in Electrical Engineering and a major in Computer Architecture. Since then he has worked for a number of software houses in the Netherlands and participated in several major projects. His primary field of interest in these projects has been, and still is, datacommunications, especially the integration of multivendor network systems. For the past 5 years, he worked as a consultant for his own company, Le Reseau (French for "The Network"). Le Reseau now focuses on network security-related projects and consultancy.
|