Cover V07, I05
Article
Listing 1

may98.tar


Web-Enabled man Pages

Steven G. Isaacson

There are several obvious benefits to Web-enabled man pages: (1) the Web browser interface makes it easy to scroll back and forth; (2) you can access man pages on different systems without having to open another session and login to that machine; and (3) you can see more of a page at a time by changing the browser's font size.

But the real advantage to Web-enabling (actually HTTP-enabling) is that when you're reading the man page and it says, for example, see chown(2), all you have to do is click there, and, presto, there's the chown(2) man page. Then, the browser's Back button will take you back to the previous page.

This feature is particularly useful when you don't know exactly what you're looking for. Most man pages contain a SEE ALSO section at the end that lists related items of interest, such as system calls and companion programs. With a Web-enabled man page, you simply click on the item you want to see.

This article briefly discusses the creation of man.sh, a shell script designed to serve up Web-enabled man pages (Listing 1).

Dynamic Web Pages

There are two types of Web pages: static and dynamic. Static Web pages are simply files that get sent to the browser. Dynamic Web pages also get sent to the browser, but they are generated anew for each request. That is, they're dynamically created. The results you might see from a Yahoo search request are an example of a dynamically created Web page.

I'll do the same with the man pages I create, using a common gateway interface (CGI) script. Generally CGI programs go in the cgi-bin subdirectory of your Web server's $SERVER_ROOT directory. If your program is called prog, you would access it from your Web browser like this: http://[your_server]/cgi-bin/prog. Here's a simple three-line CGI program - a shell script that displays the system date.

#!/bin/ksh
echo "Content-type: text/html\n"
date

Markup Tags

Obviously the output from any command can be displayed in this way. However, as the output gets more complicated, problems crop up. To see why, I'll make our dynamically created Web page more sophisticated, by adding a title ("YOUR TITLE GOES HERE"), a heading ("HEADING CENTERED"), and distinguishing between the header and body sections. This is done by adding markup tags (as in HyperText Markup Language), which are simply keywords enclosed in less-than and greater-than signs. The keyword marks the start of the section, and slash-keyword marks the end of the section.

#!/bin/ksh
echo "Content-type: text/html\n
<html>
<head>
<title> YOUR TITLE GOES HERE </title>
<h1 align=center> HEADING CENTERED </h1>
</head>
<body>
<pre> `command`` </pre>
</body>
</html>"

The <html> and </html> tags specify where the HTML document begins and ends. The "Content-type" line - echoed before our document begins - is required by the HyperText Transfer Protocol. The <head> and <body> tags separate the header portion of the document from the body of the document. The header portion of the document describes information about the document. In this case a title and heading, which is centered on the page. The body portion contains the document information.

In the first example, we displayed the output from the date command. In this more sophisticated example, we display the output from command. The <pre> and </pre> tags wrapped around command indicate that everything in between is understood to be preformatted. This solves a display problem as described below.

Preformatting

Suppose that instead of getting output from the date command you want to see all activity on the system. To do this you could display output from ps, for example, ps -ef. What you expect to see is a listing of all of the processes currently running on the system. But that's not what you get. What you get - in the browser - is one long line of output. Newlines, indentation, and whitespace are all ignored. When displaying the output from a simple command, such as date, the problem is masked. There is no leading whitespace and only one newline, so the output looks fine. But the output from ps -ef is not that simple.

To preserve the display of whitespace and newlines, you must tell the browser that the following text (i.e., everything between <pre> and </pre>) is preformatted, which is what we did above:

<pre> `command` </pre>

But there is another problem that must be solved.

Embedded Markup Tags

What if the text you're displaying itself contains embedded markup tags? For example, suppose you want to display the output from man 3s printf. The top of the man page shows the required C preprocessor include line:

#include <stdio.h>

To your browser, everything between a less-than and a greater-than sign is a markup tag. So, <stdio.h> looks like a markup tag, but it isn't. What does the browser do with unknown markup tags? Nothing. It simply throws them away. The #include line from the printf(3S) man page looks like this when displayed in your browser:

#include

Fortunately, once you understand what's happening, the problem is easily solved. In order to display literal less-than and greater-than signs, you must use keywords: &lt for less-than, and &gt for greater-than. Your CGI script must filter the output from your program before sending it to the browser, substituting &lt for <, and &gt for >. To display the correct #include line in a browser, the output from your CGI script would look like this:

#include &ltstdio.h&gt

Command-Line Arguments

Now that we know how to display the output from man printf, we also want to be able to display the output from man anything. How do you tell the man.sh script which man page you'd like to see?

When man.sh is called without any arguments, it generates a form that allows you to type in the name of the man page you'd like to see. When the form is submitted, the arguments (i.e., the requested man page and section) are passed to man.sh. The form is also displayed if the requested man page does not exist, so that you can try again.

How are the arguments passed? A longer discussion of forms and the CGI interface is beyond the scope of this article, but basically all you have to do is snake out the arguments from the $QUERY_STRING environment variable, which gets set automatically when man.sh is run by the Web server.

For example, to see the man page for chown(2), this URL is generated by the form:

http://cgi-bin/man.sh?man=2+chown

man.sh parses the QUERY_STRING, changing + into a space, and then runs the man program with 2 chown on the command line. This URL could also be typed directly into your browser, or embedded as a link in some other Web page.

See Also

Now that we know how to pass arguments on the URL command line, it's possible to embed links to other man pages in the current man page. First, we run man to get the requested information. Second, we filter the output from man, so that it can be displayed in a Web browser. And third, if there are references to other man pages within the page we're about to display, we embed links to those other pages.

To find out if there are references to other man pages, watch for strings of text like grep(1), lang(5), and catman(1M). When one of those strings of text is found, it can be converted to a link back to man.sh with the appropriate command line. For example, a string of text like this:

chown(2)

gets converted to this:

<a "http://cgi-bin/man.sh?man=2+chown">
chown(2)</a>

Having embedded links allows you to click from man page to man page, as needed. The "man" man page, for example, contains 32 references to other man pages.

man.sh

Installing man.sh is simple. Download the program (man.sh), make it executable, and move it to your $SERVER_ROOT/cgi-bin directory. Then, in your browser, type http://[your_server]/cgi-bin/man.sh. That's all there is to it.

Further Information

For more information on the Hypertext Transfer Protocol (HTTP/1.0), see: http://www.ics.uci.edu/pub/ietf/http/rfc1945.html

For more information on the HyperText Markup Language (HTML - 4.0 Working Draft Release), see: http://www.w3.org/TR/WD-html40-970708/

For more information on The Common Gateway Interface, or CGI, see:
http://hoohoo.ncsa.uiuc.edu/cgi/

About the Author

Steven G. Isaacson currently works with the porting group at Endura Software Corporation (http://www.enduracorp.com). He may be reached via email at steven.isaacson@enduracorp.com.