Article

A Perl of a Site Map

David Sweet

Every MIS manager I have worked for has had some version of a blank photocopied map of his building's cubes and offices. He'd fill in all the little squares, rectangles, and polygons with user names, IPs, or hostnames. When I needed to find out where a particular user or equipment was located, he'd go through his pile of photocopies until he found the right one. Then I'd replace the equipment or solve the user's problem.

At one of my first job sites, I felt there had to be a better way. I was just learning how to use Perl for CGI programming, but I was able to quickly write a couple of scripts and scan in a blank map. Thus, my first clickable Web-based site map was born. I showed it to my manager, but he was unimpressed. Apparently he was very fond of those dog-eared, penciled-in photocopies.

At my last job, I discovered that my manager didn't even have blank maps for reference. She simply tried to remember the location of equipment and people. Since she had not become attached to any other method, I felt this site was a prime candidate for using my scripts, so I installed them. My manager liked them, but had a few suggestions. After another five weeks of intensive development, I had the final product. I called it, simply, Site-map.

Site-map incorporates these features:

• A searchable Web-based graphical site map
• Clickable hot spots correspond to cubes and offices (referred to as “areas”)
• Information fields about areas are revealed by clicking
• A single configuration file for controlling the number and type of information fields
• A report generator
• A cookie-based preference for controlling visible fields and report styles
• Security features that password protect the editing features

You need to understand the inner workings of your Web server in order to install the scripts, files, and directories that comprise the Site-map software. You also need to have Perl installed. I have installed Site-map many times on Solaris systems, and once on a Linux box. The distribution is UNIX-centric, but I actually developed it on a Macintosh, and it works great on that platform. Although I haven't tried it yet, I'm confident it will work on Windows systems, also.

Installing Site-map

The first step to installing and configuring the software is to download the compressed tar file from the Sys Admin Web site. When Site-map is uncompressed and untarred, you will have created the file hierarchy diagrammed in Figure 1. Permissions are important. The permissions can be looser then specified, but no tighter. However, if the permissions are looser than specified, then you may weaken security. The group owner shown in Figure 1 is http. Whatever group your Web browser uses to execute CGI scripts should be used instead of http. If your Web server is Apache, then you can determine the group identity by viewing Apache's configuration file and noting what the group directive is set to. chown and chgrp the files as appropriate, then proceed.

If the Web server you are using is Apache and its share directory has both a htdocs and a cgi-bin directory below it, then you may untar the Site-map distribution from within the share directory. If this is not the case, you will have to move the Site-map directory to your Web server's document root, and move the contents of cgi-bin into the directory where your Web server keeps its CGI scripts. Activating a portion of your Web server's directory structure as a CGI repository is beyond the scope of this article.

Each of the Perl scripts in cgi-bin start with the line #!/usr/local/bin/perl. If /usr/local/bin/perl is not the path to your Web server's copy of Perl, then you will have to change this line in the nine .pl files.

The map-lib.pl file will have to be modified to work with your Web server. The following global variables that you will find set at the beginning of map-lib.pl affect all of the scripts that make up Site-map:

$METHOD -- Should be set to “post” so that when form requests are made to Site-map's CGI scripts. None of the CGI parameters will be displayed on the user's Web browser URL field. I recommend you keep $METHOD set to “post” unless you are doing testing. The alternative setting is “get”. Note that some Web browsers and Web servers cannot handle the very long URLs created by using “get” as your form method.

$url -- Should be set to the top level of Site-map. I use /site-map/ because, as instructed above, I placed the distribution's site-map directory in my Web server's document root directory.

$homeUrl -- Should be set to whatever you consider Site-map's parent Web page. I use / because the top level Web page for the Web server on which I installed this example was my IT group's home page.

$searchUrl -- The URL path to the Site-map's static html page, which passes search requests off to the search engine on your Web server. I have provided a simple html page (search.html) and a Perl script (map-search.pl) that will work with Swish-E. If you are using a different search engine, then you will need to develop both of these on your own.

$getUrl, $editUrl, $updateUrl, $searchUrl, $reportUrl, and $prefUrl -- Should all be set to a path relative to your server's document root. By default, Apache uses /cgi-bin for its CGI scripts. So an Apache example for $getUrl would be /cgi-bin/getarea.pl. Later, you may want to change the path to some of these scripts to password protect the editing portion of Site-map. I'll discuss implementing that feature later in this article.

Listing 1 shows what the above variables should be set to if Apache is your Web server, its install base is /usr/local/apache, and its cgi-bin and htdocs directories reside in Apache's share directory.

The last step necessary to get a simple example of Site-map running on your Web server is to modify the static Web page index.html. The lines of index.html that may need to be modified are shown in Figure 2. If you want the top level of your Web server to be the home page for Site-map, and you did not move map.gif and search.html out of the directory where index.html resides, and your Web server uses /cgi-bin for CGI scripts, then you do not need to change any of these lines. Otherwise, make changes as appropriate.

Once you have Site-map working, you can see how it all fits together. By pointing your Web browser at the top level of Site-map, you should see a page like the one in the top center of Figure 3. The URL will be something like http://your_web_server/site-map. Clicking one of the navigation links will take you to the appropriate script or html that will display the indicated Web page. Figure 3 shows what each Web page does, but it doesn't explain the purposes of all the files in the distribution:

• map.gif is an image with a square labeled “Area 1” in the middle, and you will modify this image to look like the floor plan of the site you are tracking with Site-map. I will describe how to do this later.

• map.conf is used by all of the scripts to determine the nature of the site database. Without this file, none of the scripts know how to interpret the data in the area files.

• map_conf_api.txt is not used by Site-map. It is a reference for properly modifying the map.conf file.

• The data directory holds all the area data files. For now, only 1.txt resides there.

• updatearea.pl resolves editarea.pl submissions. Because it sends a user back to index.html, it cannot be displayed in Figure 3.

• map-search.pl, similarly, resolves search requests initiated by search.html. map-search.pl will give you errors until you configure your system to use Swish-E. This is explained in the next few paragraphs.

• map-lib.pl, cgi-lib.pl, and cookie.lib are libraries, that all the other scripts reference for redundant code. cgi-lib.pl was written by Steven E. Brenner. In accordance with his copyright notice, I have modified it for my needs. cookie.lib was written by Matthew M. Wright, and he has given permission to distribute it with Site-map.

Configuring the Search Capabilities

search.html and map-search.pl were written to use Swish-E. Swish-E is a Simple Web Indexing System for Humans, originally written by Kevin Hughes. The current, enhanced version is being maintained by the Swish-E team. To use a different search engine, you will have to configure Site-map and modify search.html and map-search.pl to use it. If you want to use Swish-E, follow the link found on search.html to the Swish-E distribution. Download and compile it, unless you find a copy already compiled for your Web server's architecture.

Index the site map before you run your first query. To do this, create an index configuration file for Site-map. Use the example swish.conf file, which comes with the Swish-E distribution, as a template. The following swish.conf lines will have to be used since they are written or modified to work with your system:

IndexDir /usr/local/apache/share/htdocs/site-map/data
IndexFile /usr/local/apache/share/htdocs/site-map/swish.index
ReplaceRules replace /usr/local/apache/share/htdocs/  \
   site-map/data/cgi-bin/getarea.pl?filename=

To actually create an index, run the following command:

/usr/local/bin/swish-e -c /usr/local/apache/   \
   share/htdocs/site-map/swish.conf

Modify the above paths as appropriate.

Unless you put everything where I do, you will have to modify search.html and map-search.pl, in order for your search queries to go through the Site-map interface. The lines of search.html, which may need to be modified, are shown in Figure 2. Notice that, like index.html, the top and bottom of the file have identical lines that need to be modified the same way because the navigation links appear at the top and the bottom of the page. These are the map-search.pl lines that may need to be modified:

$swishcmd='/usr/local/bin/swish-e';
$swishindex='/usr/local/apache/share/   \
   htdocs/site-map/swish.index';

When you have made the appropriate modifications, the search feature should work. However, unless you update the index regularly, your search queries will provide less and less accurate results. Therefore, you will probably want to make a crontab entry similar to the following one to ensure the index is kept up to date:

15 4 * * 1,5 /usr/local/bin/swish-e -c \
  /usr/local/apache/share/htdocs/site-map/swish.conf > \
  /usr/local/apache/share/htdocs/site-map/swish.log

Editing the Site-map Configuration File

The next step is to start modifying the Site-map configuration file to meet your needs. If you don't modify the map.conf, this Web-based site map will be no better then my old MIS manager's photocopies, since it has only one field to start with. The example shown in Figure 4 illustrates the power and use of this file. Figure 4 also shows an edit details page of an area whose Site-map is using the example map.conf. For a nuts and bolts description of the map.conf, file consult the map_conf_api.txt.

The first line of the example map.conf is a comment. A line beginning with white space or a pound (#) sign is considered a comment line.

The bulk of the lines shown in Figure 4 are nickname descriptions. A nickname description has four elements. The first is the nickname itself. This must be unique and must also start the line. I always make nicknames short and in all capital letters, but they don't have to be. Since white space delineates nickname description elements, a nickname cannot have white space within it, and it can't start with a pound sign either. The nickname is for internal use and will not be displayed through the Site-map interface.

The next element is the label of the nickname; it is surrounded by double-quotes (“”). Since it is within quotes, it may have white space but cannot have double-quotes imbedded in it. The label is displayed left of a field when viewing an area through the Site-map interface.

The third element is the nickname type. There are three types of nicknames: OPEN, SELECT, and LIST. They must be spelled with capital letters. If your type is OPEN, your nickname needs no more information beyond this line. If your type is SELECT or LIST, then a SELECT or LIST description must appear elsewhere in the file:

• OPEN refers to an open-ended field. A user, through the Site-map interface, can type anything he wants into a field that is OPEN.

• SELECT allows users to select options from a pull-down menu.

• LIST types are made up of other nicknames. A LIST can be made up of nested LISTs.

The last element determines whether there will be only one value for the nickname (SINGLE) or possibly many for a given area (MULTI). In the example, IP is a MULTI because a given network port can have more then one IP address.

The rest of the lines of the map.conf are SELECT descriptions, or lines that start or end a list description. SELECT descriptions start with the nickname and are followed by white space delineated, double-quoted values. SELECT descriptions must be described outside any LIST descriptions. The last line of the example map.conf is a SELECT description.

Your map.conf has to have at least one LIST description. Like all LIST descriptions, it starts with its nickname on a line by itself. The mandatory LIST nickname is MAIN. Every line between the line that reads MAIN and the line that reads END constitutes your top-level nickname descriptions. Any SELECT or LIST nickname descriptions that you place within MAIN need to be described elsewhere. COMP, NETWORK, and PORT are examples of additional LIST descriptions.

Ensure the fields are formatted the way you want them by going to an edit details page and viewing the layout. You can make changes to the map.conf file after the database is populated, so it doesn't have to be perfect now. If you do this, however, the data stored in the different fields can change in surprising ways. To prevent unpredicted behavior when you change the type of a nickname description from OPEN/SELECT to LIST, or LIST to OPEN/SELECT, also change the nickname. Because only the label is seen through the Site-map interface, this won't confuse you or the other Site-map users.

Creating the Image Map

Now that Site-map is working and the database is laid out the way you want it, you'll have to modify the map.gif and index.html for your site unless you are blessed with having only one cube in your building. You'll need a drawing program to modify the map.gif. You can draw a simple floor plan from scratch (see Figure 5). For a more complicated corporate environment (Figure 6), you will need a jumping off point. Scan a dog-eared photocopy or the office blueprints. In Figure 6, I scanned the blueprints in little by little. Then I put the pieces together with a drawing program and scaled it so it works well on a Web page. No matter how you scan it in, you'll want to clean it up in your drawing program.

Next, author the image map definition that will reside in the index.html file. I created the image map in the distribution without the aid of an html authoring program, since it is so simple. The area I wanted a hot spot on was 100x100 pixels and started 50x50 pixels from the top left corner, so I only needed these lines:

<AREA SHAPE=RECT COORDS="50,50,150,150"
HREF="/cgi-bin/getarea.pl?filename=1.txt">

When a Site-map user moves his mouse over a hot spot like the one above, his cursor will change. If he clicks in the hot spot, the Web browser will be redirected to the Web page that HREF is set to. In this case, the Web browser runs the CGI script getarea.pl with the parameter filename set to 1.txt. You'll have to author lines for each area on your map.gif. You can use your drawing program to find out the XY coordinates for the top left and bottom right corners of each cube and office, but I usually use an html authoring program to define the hot spots. I keep an early version of Pagemill on my laptop for this, but any of the modern WYSIWYG html authoring programs can do it.

getarea.pl requires “filename” to be set to something in order to know which file in the data directory to parse. For security reasons, Site-map doesn't create any area files in the data directory. For each hot spot, choose names and create area files with the UNIX touch command. Each one must be readable and writeable by your Web server. I create each file with .txt extension so I can point my Web browser at the raw data file and it'll be able to display it. The .txt is stripped off by all the scripts when they display the area's name. You can use any filename you want for the area data files. To keep things simple, I just use cube or office numbers, such as 001.txt and 002.txt. The nice thing about using numbers is that you can easily script area data file creation (see Listing 2).

Password Protection

I mentioned that the editing portions of Site-map could be password protected. I have been avoiding this topic since this is a function of your Web server, and Web server management is beyond the scope of this article, but there are some specifics that need to be addressed when password protecting any portion of Site-map.

There are three steps to password protecting the editing portions of Site-map. First, move editarea.pl and updatearea.pl into a directory you have established as password protected. Second, edit these so that the “require” lines at the top point to the location of the map-lib.pl file. It initially reads:

require('map-lib.pl');

If a sub-directory of cgi-bin is password protected and you have moved them into it, then you only need to modify the line to read:

require('../map-lib.pl');

Third, modify the map-lib.pl to set $editUrl and $updateUrl to the new location. In the case of Apache, if you have established a script alias named /secured for the password protected portion of your Web site, then these lines would read:

$editUrl="/secured/editarea.pl";
$updateUrl="/secured/updatearea.pl";

Modifying the Site-map's Functionality

There are modifications a Perl-savvy administrator might like to make to the distribution to meet his or her needs. All of these I have done, or will do, to solve different problems. Since I am a minimalist when it comes to Web page design, you'll note the absence of colored backgrounds, flashy navigation bars, or animated gifs. If the look and feel doesn't match your company's intranet Web site design, you only need to modify portions of cgi-lib.pl. As I mentioned, I didn't write cgi-lib.pl, but I have modified it a great deal to work for me. I encourage you to do the same. The sub-routine that can control the overall look of a Web page is printTitle, but you will also need to modify the static html pages index.html and search.html to get a uniform look and feel.

The style of my navigation links is also spartan and reflects the needs of the Web sites I have worked on. The sub-routine that controls this is in map-lib.pl and is called printNav. The scripts that use this sub-routine are getarea.pl, editarea.pl, map-report.pl, and map-pref.pl. Also note that the array @urls defined at the beginning of each of these scripts determines which of the navigation links will be displayed by printNav. Like the first suggestion, the static html pages will also have to be modified.

Without the index.html and map.gif files, Site-map is still a simple and powerful database, but it lacks an initial interface for a user to access it. Because of this, when asked to create a subscription database for a Web site, I simply modified Site-map. Obviously, I had to change the script names, the titles of the Web pages, and the navigation links to reflect the new purpose, but that was surprisingly simple. The new initial interface was an html report generated by map-report.pl with the “show filename” preference turned on. I didn't use the .txt extension on the area data files, because they were now subscriber data files. The filename became the user's login name. My point is that you don't need to use Site-map as a site map. You can, without any changes, track just about any kind of data with it.

You can track more than one site with a single installation of Site-map, as long as each area data file has a unique name. You only need to have two Web pages like the present index.html, but problems arise. Reports include information about both sites and when you return to the index.html, it may not be the map you were just dealing with, but these are minor annoyances. Without major changes, you can avoid these by just installing the distribution more than once. To do this, you will either have to rename the scripts, or put them in a different CGI directory. You might have to modify the map-lib.pl, all “require” lines, and the static Web pages, but you'd be up and running with two site maps in no time.

Alternatively, you can create two data directories and introduce a new CGI parameter, “database”. I have done this with one install, and it went smoothly. The filename parameter goes through a lot of testing to ensure a user is not trying to get out of the data directory and to check out a file, such as /etc/passwd. You'd have to do the same with database, but if it passes the tests, then you'd introduce it into the $datapath variable. So, most of the modification would occur in map-lib.pl, but the static Web pages would also have to be modified. Only someone that really needs this sort of functionality should attempt this modification, since it does require making a basic concept change to the distribution.

A final modification suggestion is regarding the introduction of a new data type. I haven't done this yet, but I worked it out in my head, and the next time I implement Site-map for a company I will probably include it. I'd call the type REALTIME and, instead of consulting the data file for the field values, it would run something on the command line and place the results into the field. This would slow down display time, but it would make Site-map more powerful. A nickname description for it in map.conf could look something like this:

PING "Ping Results" REALTIME SINGLE

Its corresponding REALTIME description, which would reside outside the parent LIST description in the map.conf, could look like this:

PING "ping $VALUE"

When you go to the details page for an area, the field would display the results, and it would tell you whether there was “no answer”, or whether it “is alive”. The value for $VALUE would be set in the edit details page. This would allow your site to implement system monitoring without the cost and complexity of installing HP's OpenView or Sun's SyMon. Anything you can run from the command line could be used, including scripts you've written yourself. This would constitute a security threat, but as long as only experienced and trusted systems administrators are given access to the map.conf file and to the edit scripts, the risk would be minimal. Like the last possible modification, this is a serious conceptual change and should not be attempted carelessly.

I hope I have inspired you to sneak into your boss's office and throw away all the photocopies of you site. Site-map is really a better way to do things.

About the Author

David Sweet is presently an independent contractor working primarily in the Silicon Valley. He received a history degree from UCB and teaching credentials from SFSU. After deciding substitute teaching wasn't for him, he began his five-year career as a Solaris systems administrator. He can be reached at: dsweet@tgd-inc.com.