Getting UNIX and NT on Speaking Terms
These days modern business networks comprise different machines, some running UNIX and others running NT. Traditional applications for UNIX systems are database-related, resulting in these machines gathering large amounts of corporate data. It is not always possible or desirable to migrate this data from UNIX to another platform (be it NT or something else). Due to the ubiquity of Microsoft Windows on the desktop, however, Windows NT is often the preferred platform for server and middleware applications. These facts result in a need for communication between Windows NT and UNIX at the programmatic level. One solution for this need is Remote Procedure Calls (RPCs). This article will show how you can link a Windows NT application to a UNIX system (and the data stored in the UNIX database) using RPCs.
Why use RPCs?
There are various compelling reasons to use an RPC interfac instead of an open or proprietary "standard" like ODBC (Open DataBase Connect) or SQLnet.
One reason may be the fact that the application you want to connect to does not offer any of these standard interfaces. Imagine a legacy application that has been developed over the years, is written in a standard programming language like C and runs on one or more UNIX systems. It is not possible or feasible to add an ODBC interface to this application, so you have to find another way. An RPC interface can be integrated into your application quite easily and offers the connection you need.
If your server application does offer an ODBC or other "standard" interface, using an RPC interface offers a few advantages. Imagine that an ODBC interface offers clients the ability to execute any query they like against your database. ODBC in itself protects you from unwanted updates by means of the username/password that a client must specify before connecting to the database. However there is no easy way to restrict the client to one or more queries. On the other hand, if you use an RPC interface, you may have each produre execute one specific query. The results of this query are then returned to the client. In this way, you can restrict the client to a set of very specific queries and thus protect your database from unwanted snooping.
Another reason for using an RPC interface may be that this will better integrate into the client environment. If you write your client following specific rules, or you need to present your data in a specific format to your client, using an RPC can make the process a lot easier. Also, before returning the data to your client, you can do postprocessing in your server application, so your client can better handle the data your are returning. Before delving into the use of PRCs, however, I will briefly explore other methods that have been used to move data between systems.
Linking a new system with an old system can be done in many ways. One of the simplest ways is to have one system (the master) send data to the other system (the slave) by means of files. The data needed is stored in one or more files that are copied to the slave. These copy actions can be done immediately or on a regular basis. When a large amount of data is involved, these copy actions are best done at night, so the new data is available the next day for processing. Changes made to the data on the slave system are then copied back to the master the following night. Although this method may appear awkward or ancient, it is used often, especially in batch-oriented environments involving large amounts of data. The actual copying can be done over the network (e.g., by using ftp or NFS) or by means of tapes. This again depends on the amount of data involved. This link could be considered a data-oriented link because no actual applications are involved, just the data managed by the applications.
If a more direct kind of link is needed, another technique is required. This need may arise because of an interactive client requiring immediate access to the data. Another reason might be a newly designed graphical interface running on another machine that is connected to the server via a network. This is the situation that I will examine in this article.
Application Links with RPC
In this example, the UNIX server is running what is often referred to as a legacy database application. This application is written in C and has provided users with a character-based interface displayed on traditional terminals connected to the host over serial lines. Due to the increase of PCs running Microsoft Windows, the need for a windows-based user interface grew, and now that Windows NT is also available, this need has become more acute. To alleviate the need for connectivity software on each of the PCs, and to provide a degree of isolation for the primary UNIX database system, the plan is to install an intermediate server running Windows NT. The NT machine will be used as a server for PCs running either Windows 3.1 or Windows NT. However, the NT system will also be a client for the UNIX database server. It will query the database and use the results in one or more NT applications. The UNIX server and Windows NT client will be connected through a TCP/IP-based network.
The link between the NT applications (the client) and the UNIX application (the server) is based on a technique called Remote Procedure Calls (RPCs). By using RPCs, the client is capable of directly accessing the data in the server without the hassle of a normal network connection (at least to the application programmers).
In a normal UNIX network connection, unless a higher level API is used, an application programmer must create a socket (communication endpoint), open it either as server or client, send and receive data though the socket, and finally close it. This all involves a lot of programming code, especially when error detection and recovery are needed. When using RPCs, a programmer can call a (generated) function just like any other function. This generated function will take care of all of the lower-level network communication and housekeeping chores without additional programming by the application developer. I'll discuss this more in a later section.
How RPC Works
To be able to maintain an RPC-based link, you need to know how it works and what different components are involved. Because of its simpler nature, and for reasons that will become clear later, I will base my example on a Sun RPC-derived standard.
So, here is the breakdown of an RPC-based link. This breakdown is based on the flow of control and information.
The first step in this cycle is taken when the server process is started. When the process starts, it will create a dynamic network port that will listen for requests from the client. To enable the client to find this portnumber, a server registers itself with a special process called the portmapper. This portmapper maintains a table that links the server to its currently allocated portnumber. The portmapper listens on a predefined network port (111 to be exact, both for the TCP and UDP based RPCs). Once the server is started and registered, it is able to process client requests. You can look at the registered servers of a portmapper by using this command:
rpcinfo -p localhost
This will query the portmapper and have it list all the registered servers. On a solitary Solaris machine, the output looks like this:
program vers proto port service
100000 4 tcp 111 rpcbind
100000 3 tcp 111 rpcbind
100000 2 tcp 111 rpcbind
100000 4 udp 111 rpcbind
100000 3 udp 111 rpcbind
100000 2 udp 111 rpcbind
100068 2 udp 32772
100068 3 udp 32772
100068 4 udp 32772
100068 5 udp 32772
100003 2 udp 2049 nfs
100005 1 udp 32775 mountd
100005 2 udp 32775 mountd
100005 1 tcp 32772 mountd
100005 2 tcp 32772 mountd
You can always use this rpcinfo command to check whether a server has started correctly. If the unique server number (the first column) is not present in the list, the server has not started.
The flow starts at the client, which takes the initiative to query the server. The first thing a client does is call the portmapper to ask it for the portnumber of the server. The client does this by sending the portmapper three keys that will uniquely identify the server it is looking for. These keys are:
Both the programnumber and versionnumber are generated when the RPC is generated. These numbers should be unique, at least across your local network since they are the only way to identify a server. The protocol identification specifies whether TCP or UDP should be used for communications. This is all done invisibly to the programmer.
Note that to communicate with the portmapper, the client uses another RPC, but because the portmapper listens on a predefined portnumber, the client does not need to query a portmapper-mapper.
Once the client has the portnumber of the server, it will transform the input parameters of the RPC into a network-friendly format that can be transmitted across the wire (a process called marshalling). The reason for this is simple. Suppose you have a financial application running on a SPARC machine. This machine stores its binary values (e.g., the amount of money in your bank account) in its native format. This native format is normally not compatible with the native format of another machine of a different architecture. So, if you didn't transform the input (and output!) data, your 1 million dollars on the server may appear like 1 dollar on the client. Once it has converted its input arguments, the client will send the query to the server.
After sending the data to the server, the client will wait for an answer. Once the client has received an answer, it will convert the output it received back into the native format of the machine. Once the client has done that, it can return the data to the application, which finishes the cycle.
Other RPC Mechanisms
The basic flow of an RPC call is simple: a client calls a server. In this setup, a client has to wait until a server has sent an answer. In most cases, the server cannot do anything else until it has completed the client request.
In a large networking environment with a large number of clients, this is a problem. So, to make the client and server less intertwined, you can use callback RPCs. This involves a lot of extra programming effort, both on the client and server side.
What happens in a callback RPC is that a client adds some information to the request it sends to the server. This extra information enables the server to call the client once it has finished processing the request. So, after sending the request, the client does not wait for an answer from the server but goes about its business. Once the server has completed the request, it calls the client and executes an RPC to store the result. In effect, the server becomes a client, and the client becomes a server.
Building the Application
The description of the RPC-based link is not specific for UNIX, but can be applied to other platforms as well. So, if you want to connect your NT application with a UNIX application, you can use the same techniques.
As mentioned earlier, most of the network-related stuff is hidden inside functions that perform an RPC to the server. These functions do not come out of thin air, but are generated by a code generator. Here is the catch - standard code generators for UNIX do not work on NT and vice versa.
Using the native generators of each platform and trying to match their respective views of the RPC world would take too much time. So for this article, I used a product that is available for both NT and UNIX and that can generate RPC code for each of these platforms based on the common input file. This product is called EzRPC by NobleNet. Similar products are also available from other vendors. EzRPC is a mixture of both Sun RPC and DCE RPC. Its communication is roughly based on the Sun standard, but it has a number of extensions. It does support threads (both on NT and UNIX), and you can specify the way a parameter is used (as input only, output only, or both).
The first step is to generate a description file that contains all the information on the different RPCs, their parameters and protocols. This description file can be generated with a wizard running on NT. The screen in Figure 1 shows the opening screen of this wizard. In this screen, you can give your application a symbolic name and have it generate a unique application number.
Once you have walked through a number of other screens, you can specify the parameters of a remote procedure call. This step is very important and determines the actual interface between your application and the RPC environment. From your application, you need to call the function you have specified. Figure 2 shows an example.
Once you have finished specifying your functions and parameters, that information can be saved in a special file that will be used as input for the code generator. An example of this input file is shown in Listing 1. This file is generated on the last screen (Figure 3). On this screen, you can generate both server-side and client-side code.
Now you can use this input file both on NT and UNIX to generate the code. This code generation is done with the EZrpc code generator that recognizes the different settings in the input file. The generator will generate the .c source files, the .h include files, and a makefile so you can directly build your application. This process is accomplished by running the generator at the command line with the name of the RPC parameters input file as the argument. Listing 2 shows an example of running the generator. The process is identical under NT.
After these steps are completed, you have the basis to which you can add your application specific code. This application specific code does not need to include network details or converting data to and from the network-friendly format. This is all done by the generated code and its associated link library.
Once you have generated the code with the generator, you need to add the application specific calls. These calls may for example execute a query on your database or start up a dedicated program.
It would be inconvenient to add your code all through the different generated files. Therefore most generators, EzRPC included, generate two specific files to which you can add your code. One file is specific for the server, and the other for the client.
On the server side, this file is called <application>_lib.c, and it contains a function skeleton for each RPC specified in the definition file. The parameters of each function correspond to the ones you specified in the definition file, too. The type of these parameters is also set according to the definition file, and you do not need to convert these parameters to or from their network format.
The <application> tag should be replaced with the name of your application.
An example of a generated server side file is shown below.
/* RPC Library File : C:\TEMP\test_lib.c. Generated by EZ-Wizard.*/
int Rpc_Test ( int inputpar , char * outputpar )
/* Add your application specific code here */
sprintf( retbuf, TS%dCC, inputpar );
outputpar = retbuf;
/* End of application specific additions */
On the client side, things are a bit different. You do not need to add code to the generated files, but you need to call the generated function from your client part. So, all client versions of the generated functions are contained in a special file called <application>_stub.c. The functions can be called from your application, and you should link your application code with the client modules generated by the generator.
Configuring Your System and Running Your Application
After building either the client or server part (depending on the platform), your application is ready to be used. But before you can use it, you must configure a few things. On the NT side, which will be used as a client, you do not have to configure anything. You just need to tell your application the name of the host that the server is running on.
On the UNIX side, you need to start the portmapper process. This will enable the server to register itself and make its portnumber known. If your application needs specific settings (e.g., a database engine), you must make sure it is running as well. That is all there is to configuring the server side. You can now start your server (preferably as a daemon).
When linking two applications, or opening up your applications to clients on your network, using an RPC-based system has a number of advantages. First of all, you can open up your existing legacy applications without the overhead of an open standard. Sometimes these standards are not available, in other cases these standards might be a bit overdone.
A second advantage is the fact that you can exactly model the interface to your needs. You might even combine your existing application with a new application and keep the interface to your clients simple. You can restrict your interface to only the queries you need, and thoroughly check the input parameters before executing the query.
A possible problem is the complexity of handcrafting an RPC interface, both on the server and client side. This is where code generators come to the rescue. By means of an interface specification, you can generate most of your code. The only thing you need to do is add your dedicated application routines. Generating the code also has an added benefit; you can be sure that both the client and server use the same interface definition. Most of the tedious work is done for you, so you can concentrate on the real issues at hand.
About the Author
Arthur Donkers graduated from the Delft University of Technology with a degree in Electrical Engineering and a major in Computer Architecture. Since then he has worked for a number of software houses in the Netherlands and participated in several major projects. His primary field of interest in these projects has been, and still is, datacommunications, especially the security aspects involved. Because of this, his own company, Le Reseau, focuses on network and system security-related projects and consultancy, both for UNIX and NT. For the past four years, he has worked as an independent consultant for Le Reseau (French for "The Network").