Cover V09, I05
Article

may2000.tar


Simplifying Web Production

Reinhard Voglmaier

In the beginning of the era of Web servers, everything was easy. A few people, mostly in their spare time, put documents on a Web server to make them available to the world. Now, however, Web production has become highly specialized. IT professionals don't have time to participate in the production of every Web page and, for security reasons, it is unwise to let everyone who is working on a Web page have the access privileges necessary to put that page into production.

Our organization uses a methodical Web production process that provides clear lines of authority, lets Web designers operate in a test environment without security restrictions, and minimizes the number of users who can access the production systems. We divide the production process into the following roles:

  • Web editors design and create Web pages.
  • Publishers oversee the content of a Web site.
  • Web managers hold together the whole intranet project and decide on the software to be used, the administrative procedures, and the overall Web design.

This design addresses very good security requirements: The owner of the entire Web site (normally the Webmaster) is the only person that can access the production site directly. He does not need to be the systems administrator.

The single Web editor instead can only put his documents on the test site. For this work, he doesn't need to have a userid with special access rights. The production site normally is accessed only by the automatic procedures running with the userid of the Web owner (normally the Webmaster).

This article describes our process for bringing Web pages into production. This process is largely automated. I will describe the scripts we use for automating each step of the production process. For my purposes here, the Web site is an entity consisting of a collection of documents produced by one group and headed by a publisher, regardless of whether this entity is located on a server of its own or shares a Web server with other sites. The scripts described in this article are available from Sys Admin at: http://www.sysadminmag.com and also from my company at: http://www.GlaxoWellcome.it/Webmaster/Sysadmin.

The Design of the Publishing System

The intranet is divided into test and production stages. Publishing a site refers to the transition from test into production. The publishing system is designed using the previously described organization. The Web editors put their pages on an area on the Test Web server mounted via the SMB protocol (Samba file server). There may be several people putting documents on the same Web site. Every Web site corresponds to one file server. Once the Web editors have finished, the publisher declares the site frozen for inspection and examines the new or modified site.

The publisher then reviews the site. This review covers style, layout, and most importantly, the content. If the publisher is happy with the documents, he opens an application (called P2P) in his Web browser that helps move his site into production. This application contains a list with all Web sites he supervises. Clicking on the Web site will show the last date of publication of the site and who executed the action. The publisher then only needs to submit the form. The procedure P2P actually does very few things. It writes a file containing the Web site to be put in production, the userid and name of the publisher who made the request, and the timestamp when this request was generated. This file is written in a directory that holds all the requests (details on the directory structure later). After generating this file, the procedure prompts for confirm the execution. The real work is done much later on.

Automatic procedures now start. There's a cron-controlled procedure springing into life every five minutes, and it refers to the order book. This procedure makes a photograph of the Web site for each job. This means that the process produces a compressed tar of the Web site and puts the results in a convenient place (see below). After this, it sends email to the publisher informing him that the test Web site can be accessed again. A shell script to be executed for the update of the Web site is generated at the same time. In case the publisher decides to update further pages, it is sufficient to launch the P2P application again. The photograph and the entry in the order book will be overwritten.

The real work will be done late in the night, when the server will be less stressed. A job scheduled by cron wakes up and looks in the directory where the shell-scripts are kept. Every script is executed, and the entry in the order book is cancelled.

The Directory Structure

Several directories are used to hold information and data and every request is describe by a one-line file: <User Name>,<User Id>,<Date>,<Location of WebSite>.

  • The request directory holds all requests made by the publishers. These requests are generated by the P2P application as descibed above.
  • The scheduled directory holds information about the requests scheduled: <User Name>,<User Id>,<Date>,<Location of compressed Data File>
  • The data directory contains the data to be transferred in compressed format.
  • The old data directory stores a compressed tar of the old Web site to eventually restore the previous situation.
  • The database directory contains specifications for who can put each Web site into production.

The Database

The heart of the publishing system is the database that holds the information specifying who is allowed to do what. The database used is DBM distributed in various flavors with the UNIX operating system. DBM is a collection of key value pairs stored in tables. It doesn't offer tools like the big ones as Informix, Oracle, or Sybase, but it's handy and available on every UNIX operating system. Three tables exist:

  • Access list: Describes who can access what.
    userId1 /dir1;;/dir2;;/dir3
    userId2 /dir5;;/dir4;;
    userId1 can put in production dir1, dir2, dir3.
  • Inodes: Lists all Web sites.
    /dir1 Area;;Tue Feb 8 17:00:00 ;; /usr2
    dir1 is an area (for compatibility only, not used) has been updated Tue 8 Feb by usr2.
  • User list: Contains the userid, the user name, and the role.
    userId1 Name1;admin
    Association between userid, name, and function. The administrators have admin after the name.

The Programs

The application is made up of several parts:

Library

  • A library containing all functions to access the database scripts.
  • CGI-script P2P (Put into production) -- The publishers use this script to make requests for putting their sites in production.
  • CGI-script for administration -- The administrator uses this script to define who can put which Web site into production.
  • cron-controlled publish script -- This script is really a cron job, reading the request book every five minutes and making the photo of the site to put into production.
  • The cron job that executes the automatically generated scripts.

Automatic Generated Scripts

  • The scripts that put the single sites in production

The library contains a number of convenient functions to insert, update, and delete data in the database.

The CGI script is located in a protected directory and the Web server delivers the userid (of the user who launched the request) in the form of an environment variable. The CGI script looks in the DBM database for what sites the user can put in production, who last updated it, and when. It generates the static HTML code and the object and constructor definitions of the JavaScript. The JavaScript handles all the dynamic features of the HTML object. With this information, it constructs the dynamic part of the JavaScript.

#!/usr/bin/perl -w
require "DBMUtil.pm";   # Utility functions to use the DBM database

use CGI ;               # The CGI Library from Lincoln Stein

##################################################################
# S e t   P a r a m e t e r s
#

my $DirRichieste = "/d01/Web/transfer/accepted/" ;
my $ConfigFile = "Editors.conf" ;
. . .

# Create CGI Object

$query = new CGI ;
print $query->header() ;
print $query->start_html(-title=>'Web Site Publisher',
                   -onLoad=>'InitForm()',
                   -script=>{ -language => 'JavaScript',
                              -src=>'./admin.js' },
                   -author=>'WebAdmin@Intranet.GlaxoWellcome.it');

&open_databases($DataBase);
&ReadData($Publisher) ;
($NamePublisher,$Role) = &getUserInfo($Publisher) ;
&close_databases();


##################################################################


##################################################################

print $query->start_multipart_form(-name=>'Administration',
                                   -target=>'ProdWindow',
                                   -onSubmit=>'PutInProd.cgi'
                                   );


. . .
# Here we create a scrolling list that contains only a placeholder
# to keep sufficient place for the entries still to come.
print $query->scrolling_list(-name=>   'PathNames',
                -values=> ['DUMMYDUMMYDUMMYDUMMYDUMMYDUMMY'],
                -onChange=> 'ShowProperties(PathNames)',
                -size=>   10 ) ;
. . .
The ReadData function actually generates this JavaScript:

function InitForm() {
         InitDirList();
         AddEntry($Directory,$Type,$Date,$UserId,$UserName) ;
         . . . .
}
The AddEntry function is repeated for every Web site the publisher can put into production. The JavaScript program included in the HTML page contains all object declaration and all function definitions such as InitDirList() or AddEntry(). If the user submits the form, the script creates a file in the requests directory and prompts a confirm message. The file contains the necessary information to put the Web site in production: user name, userid, date, and Web site.

The Administration Utility

Everything is defined in a database. For user and directory administration, there is a Web-enabled application that allows updates to the database. The administration utility is written in the same way as the procedure to put a Web site into production. The CGI script generates static HTML code and the object definitions of JavaScript. The JavaScript objects are then filled with data from the database using the previously defined constructors. The administration can be delegated, because the user table defines whether the user is a publisher or an administrator. The form of the P2P application is generated on the fly by a CGI script based on the information found in the database. If the user is defined as an administrator, the CGI script generates a button to permit entrance into the admin mode.

The cron Job

The cron job is executed every five minutes. It reads the requests directory, produces a compressed tar file of the Web site to be put into production, cancels the file from the request directory, generates a file in the scheduled directory, and sends email to the publisher indicating that the photo is taken from the test site. The file in the scheduled directory contains the information of where the compressed tar file is kept and again userid, username, date, and Web site. At the end, it generates the shell script that really puts in production the Web site.

Automatically Generated Shell Script

The shell script that is generated automatically is very simple. Here is an example:

#!/bin/sh
cd /htdocs/ProdServer/Research/Microbiology
tar cvf - . | gzip > /d01/Hobbit/OldData/Research_Microbiology.tgz
mkdir /htdocs/ProdServer/Research/Microbiology_new_<JobNr>
cd /htdocs/ProdServer/Research/Microbiology_new_<JobNr>
gzip -cd /d01/Hobbit/Data/Research_Microbiology.tgz | tar xvf -
mv /htdocs/ProdServer/Research/Microbiology
/htdocs/ProdServer/Research/Microbiology_old
mv /htdocs/ProdServer/Research/Microbiology_new_<JobNr>
/htdocs/ProdServer/Research/Microbiology
rm -rf /htdocs/ProdServer/Research/Microbiology_old
Since the shell script generated contains all variables substituted with their actual values, it's very easy to debug. The new site is populated while the users can see the old site in production. The time when the state of the production site is not defined is limited to the execution of the last to move instructions.

Last but not Least, the cron Job to Launch the Shell Scripts

The last thing to do is to launch these shell scripts. The cron job looks in the directory of the scheduled jobs, launches the automatic generated script for every job, sends an email confirmation to the publisher, and cancels the entry in the scheduled jobs directory.

Extending for Urgent Updates

There are some pages that need several updates a day, so it's not possible to wait until night to update the site. However, this can be easily achieved. Since this regards only single pages, it's sufficient that we substitute the procedure that makes the photo (compressed tar) with a procedure that copies the relevant file from test into production. Obviously, there are no further steps needed. This means that we have two independent applications -- every application using its own set of scripts and its own database.

Future Development

The current solution still has some weak points. The first one is the way the pages are put on the test server. In order for every Web site to be independent from the others (and still allow sharing with other users), it must be a file server of its own. With the rapid growth of the Web sites grows administration overhead. If we take into account that there are even Web sites dismissed, we can imagine what will happen in the next 12 months. One solution is that Samba will support access control lists. This will allow having a few independent file servers and to delegate the administration of access rights to the admin responsible for this site.

Another problem is the great waste of disk space. If Web sites become larger and larger, it's no longer possible to use this procedure. For example, an 800-Meg Web site containing many images and pdf files:

  • The Web site (test and production) -- 800 Meg + 800 Meg = 1.6 GB.

Temporary Files

  • The photo taken of the Web site -- 600 Meg.
  • The saved image of the previous Web site -- 600 Meg.
  • The duplicated Web site in the production process -- 800 Meg, resulting in an overhead of 2 GB temporary files.

Now, it's the publishers job to divide his Web site into manageable portions.

Conclusion

In this article, I presented a system that gives the possibility to move documents from a test to a production server in a multi-user environment. The procedure to put the pages in production and the administration utility are both Web enabled. The access rights are defined in a database. The Web server does not have to run under a particular userid, because it writes in a staging area under no one's account. This area then reads the process running under a privileged userid. This system offers the advantage that there is only one entry point in the production system, which allows registration of the documents (e.g., in a database connected with a search engine), or assures putting in production only documents that respect the company standards, such as naming conventions. The system has its limits regarding large numbers of editors and a huge amount of space is handled. A new project will overcome these limitations.

About the Author

Reinhard Voglmaier studied physics at the University of Munich in Germany and graduated from Max Planck Institute for Astrophysics and Extraterrestrial Physics in Munich. After working in the IT department at the German University of the Army in the field of computer architecture, he was employed as a Specialist for Automation in Honeywell and then as a UNIX Systems Specialist for performance questions in database/network installations in Siemens Nixdorf. Currently, he is the Internet and Intranet Manager at GlaxoWellcome, Italy. He can be reached at: rv33100@GlaxoWellcome.co.uk.