Web Publishing
with Perl Objects
Reinhard Voglmaier
The Internet has grown exponentially in the past ten years, and
along with technical improvements have come changes to the way we
work. First, a few people may have worked on a whole Web site, but
now sometimes a whole team of Web editors may work on only a little
part of an enterprise portal.
A standard life cycle for Web pages has evolved in our company.
Web pages initially live only on a developer's desktop and
are eventually added to a test site. After the updated pages are
approved, they're put into production and the whole site is
then in a constant state of change. This cycle of Web page development
-- from test to production -- continues as long as the site
is on the Internet.
In our environment, many Web editors produce material for our
Web sites, and it would be unwise to give them all full access to
our Apache servers. Furthermore, our internal review process for
Web pages requires an official (a publisher) to approve every page
before it goes online. The sheer volume of material flowing to the
site requires some degree of automation so that the few full-access
Web managers do not have to manually post every new page.
We have developed the following system to help meet the requirements
of our process:
1. Web editors post their completed pages to a test Web site.
2. Supervisory Web publishers review and edit the completed pages
at the test site. When the page meets the publisher's approval,
he or she executes a script that places the page in the queue for
the next production update.
3. An automated script with full access to the Apache environment
posts the approved pages to a production Web server at regular intervals.
I described this environment in a previous article ("Simplifying
Web Production", Sys Admin magazine, May 2000: http://www.sysadminmag.com/documents/sam0005h/).
Since then, the network has become much more complicated, and our
original approach proved too simplistic. Some of the changes we've
made to system are:
1. The sites hosted on one of the test servers can be transferred
to different production servers.
2. The server receiving the site does not have to be a Web server
-- it could be a staging server.
3. A number of different test servers can put their pages on one
production server.
4. The system can work across firewalls.
In this article, I will describe these changes to our original
system, particularly the framework for moving Web sites from a test
environment to a production server. Note that our site has grown
considerably since we originally envisioned the system. We now have
several different test or staging servers on an intranet and also
on the Internet. One requirement of the system is that it now must
be able to move production pages to any server inside or outside
the firewall.
The Basic Idea
The centerpiece of this system is the code that transfers Web
content from a directory on a testing server to a directory on one
of the production servers. The idea behind the content transfer
between the server directories is to create an agreement between
the receiving and the sending server that contains what exactly
is going to be transferred, how this data has been produced, and
where it has to go. It models the same process our Webmaster previously
did by hand -- the supplier of the Web site sends the pages
he produces and the Webmaster installs them. The server producing
the data to be installed on the partner server sends this information
along with the data to be installed. With this information at hand,
the receiving server knows the site for which the data is intended,
where he has to install it, and how to install it.
I initially had only a test and production environment in mind,
but I am now also using this solution for other applications (e.g.,
replication servers or staging servers designed to collect parts
of Web sites produced from different suppliers).
After we send data and instructions from server A to server B,
we can also send instructions for after the page installation (e.g.,
instructions to launch the indexing process of the search engine
or other type of Web site maintenance tools).
The Design
The system consists of three parts:
- A GUI
- A database
- Procedures to let the GUI "speak" with the database
The GUI is the interface to the user, so it is the visible part
of the application. It offers users a list of pages and let's
them choose which site to put into production (Figure 1). The GUI
also offers administration options including the ability to add
users, modify user configuration, and add new sites or servers.
The database holds all data about users, sites, and permissions
and the procedures execute the various requests made by the GUI.
The procedures ensure that the data in the database is updated as
needed. See the sidebar "MVC Paradigm".
In our case, the separation developed naturally:
- The presentation happens on the client browser produced by
a number of CGI scripts on the server side.
- The data structure is fixed in a database.
- We needed a way to access the database from the CGI scripts,
and this happens via a number of methods offered from a library
of objects.
Perl offers object-oriented features as well as a large number
of libraries ranging from CGI to database management modules. However,
you can rebuild the whole framework in any other language. It would
be very interesting, for example, to rewrite the whole thing in
Java or Python.
DBM, RDBMS, or XML?
The first approach, as described in my previous article, used
DBM as a repository. DBM exists on almost every platform (including
Win32 systems). DBM is available for free from the Free Software
Foundation (http://www.fsf.org/software) or Sleepycat (http://www.sleepcat.com).
An RDBMS would be useful if you handle multiple sites with thousands
of publishers and editors. In the approach described in this article,
I used an XML-based database to hold all the necessary information.
I chose XML because it is ASCII and, as such, human readable. You
can also record new information without using the administrator
interface or, if your database gets corrupted, you can quickly identify
and correct damaged data.
Here is our Web site information:
<WebSite Name=www.gsk.it>
<DocumentRoot>/disk1/htdocs/gsk</DocumentRoot>
<TransferProcedure>TransferServer1</TransferProcedure>
<LastModified>10.10.2001</LastModified>
<Modifier>admin</Modifier>
</WebSite>
Information about our users, their roles, and where they work:
<User UserId=webuser1>
<Roles>
<Role>Administrator</Role>
<Role>Publisher</Role>
</Roles>
<WebSites>
<WebSite>www.gsk.it</WebSite>
<WebSite>www.ricercaesanita.it</WebSite>
<WebSite>www.someinternalesite.italy</WebSite>
</WebSites>
<User>
Information about the Web sites:
<WebSite Name=/medical/projects</WebSite>
<ServerName>www.gsk.it</WebSite>
<LastModifed>01.08.2001</LastModified>
<Modifier>user22</Modifier>
</WebSite>
We also need tools to access this data. I wrote a public interface
describing the necessary functionality and then the private access
functions. The application doesn't know about the private methods,
so they can be changed without changing the application.
Here is the user object:
1 sub new {
2 my ($class,%args) = @_ ;
3 my $self = {
4 UserId => $args{UserId},
5 CName => $args {CName},
6 SurName => $args{SurName},
7 Roles => $args{Roles},
8 WebSites => $args{WebSites}
9 };
10 bless($self,$class);
11 return($class);
12 }
Line 2 puts the first argument, the class name, in the class variable
and the rest of the arguments in a hash table called %args.
The hash table is used from line 3 to line 9 to initialize the object's
data. The variable $self is a pointer to a hash table containing
the object's data. We then create the object with the pointer
attached (function bless) and turn it back to the calling procedure.
Look at the public interface:
1 sub save {
2 my $this = shift;
3 my $User = _save();
4 # .... Some housekeeping can go on here anyway
5 return($User);
6 }
To create a new user:
1 $user = new User( UserId => "Admin001",
2 CName => "Reinhard Erich",
3 SurName => "Voglmaier",
4 Roles => "Administrator",
5 WebSites => "www.gsk.it",
6 ) ;
7 $user->save();
GUI -- The Cool Part
Remember when you wrote code for Motif or Visual C++ with hundreds
of lines of code for one single widget? The combination of HTML
and Perl gives you the same power, but with a few lines of code
plus the ability to run anywhere:
1 #!/usr/bin/perl -w
2 use WSP;
3 use CGI;
4 use Template;
5
6 my $Provider = new WSP();
7 my $Widget = new CGI();
8 my $Template = newTemplate("/d01/web/WSP/templates/Hello.html");
9
10 my $UserId = $ENV{'REMOTE_USER'};
11 my $User = $WSP->User($UserId);
12
13 $Template->display();
14 if ( ! init_Template($User) ) {
15 die "Could not load application";
16 }
17
18
19 sub init_Template {
20 for my $Sites ( $User->webSites() ) {
21 # Generates Java Script to put the User Data into the form
22 . . . .
We set up some variables until line 12 and then initialize our objects
-- Provider (the main object containing all configuration data),
Widget (holding the CGI object), and Template at the end (a facility
to read a template HTML file). Line 10 gets the UserId and line 11
the User Object via this UsrId. Line 13 displays the empty Template
and function init_Template generates a JavaScript that fills
the User Data into the HTML file. There's a submit button in
the HTML file that calls a procedure that posts the request (more
about this later).
This is simplified -- the true application allows you to choose
how to proceed with the site prior to production. You could put
into production only the files changed since the last site update,
or choose a method to update the database of a search engine, and
so on. There are also scripts for administration of users, sites,
and requests; the logic is still the same. We create the HTML parts
via CGI, we display the template, and we put the data into the template
using JavaScript.
The Glue -- The Application Logic
The systems knows of the existence of four directories:
- Accepted
- Scheduled
- Executed
- Data
When a publisher decides that it's time to update the site, the
system creates a file in "accepted". This file contains
enough information to permit the system to begin the update process
(i.e., the directory to be put into production, the site the directory
is living on, the person requesting the update, the type of update,
and the date of the request). Because the Web server is running normally
at the lowest privilege level (let's assume as "nobody"),
the directory "accepted" must be writable for the user "nobody".
There's a process that runs under the UserId of the Web site
owner that looks into the accepted directory for instructions. When
it finds a file, it writes the file into "scheduled" and
puts the data produced into "data" when it's finished.
The process then notifies the person who requested the update; additional
site updates can be done without putting it into production. It
starts a cron job that collects all scheduled procedures and executes
them at predefined times.
Controller -- Three Objects
The first object is called "WSP" and not only holds
the configuration files but also administers the other two objects
(Server and Job). It knows where the previously mentioned directories
are living and makes this information available via method calls.
As with the database, the information held in the objects is made
persistent via XML files.
To see the information held in the WSP file:
<WSP Name=MainWebSite>
<ScheduledDir>/WSP01jobs/scheduled</ScheduledDir>
<AcceptedDir>/WSP01jobs/accepted</AcceptedDir>
<ServerConfigDir>/WSP01Configuration/Server</ServerConfigDir>
<ServerDir>/WSP01Configuration/Server</ServerDir>
<DataDir>/WSP01jobs/data</DataDir>
<LogDir>/WSP01logs</LogDir>
</WSP>
All servers are held in the ServerDir and look like this:
<SERVER Name="gsk">
<DocumentRoot>/d01/htdocs</DocumentRoot>
<SendDataProcedure>SendData()</SendDataProcedure>
<ServerName>gsk</ServerName>
<ConfigDir>/home/projects/WSP/Configuration/Server</ConfigDir>
</SERVER>
These two objects can be considered static, because the information
seldom changes. The SendDataProcedure entry is the name of
the procedure called to transfer the data from one server to another.
The third object is dynamic because it gets generated when a Web
site goes into production:
<Job Name=user1_13>
<WebServer>gsk</WebServer>
<UserId>user1</UserId>
<SName>Voglmaier</SName>
<CName>Reinhard</CName>
<Date>Mon Jul 16 09:11:55 2001</Date>
<Procedure>putInProduction()</Procedure>
Here again all data is self-explanatory, thanks to XML. The user (UserId,
SurName, and ChristianName) who wishes to put the site into production,
the date of the request, and the requirements to put the site into
production (in this case, to execute the procedure putInProduction()),
are declared in the object's jobs.
Controller -- The Behavior
The three objects are the heart of WSP inasmuch as they determine
the behavior of the whole application. When the user decides to
transfer the site, he chooses the site to be transferred from a
list of files (Figure 1):
1 my $User = new User($UserId);
2 my @SiteList = $User->SiteList();
3 # code to get the data in the HTML Template as described above
4 my $WSP = new WSP();
5 my $Job = $WSP->createJob();
Let's create a new User object from the UserId. This object contains
all the User Data. We need this list to display the sites to be transferred
by this particular user. In line 4, the controller springs into life
and reads the configuration files to obtain the list of directories
it should know about and the servers available. With this knowledge
at hand, the WSP objects create a new job object and call the accepted
method of the job. The accepted method creates a new file in the "Accepted
Directory". Because these actions (caused by the call createJob
(in line 5) are executed on behalf the UserId, the Web server
is running (normally "nobody"), and the "Accepted Directory"
has been writable by this user ("nobody"). This is the point
at which security must be discussed.
The rest can be executed as a higher privileged user, normally
the owner of the system. Once the job has been accepted, the scheduler
does his job:
1 my $WSP = new WSP();
2 while ( $Job = $WSP->getNextAcceptedJob() ) {
3 $Job->scheduledJob();
4 if ( my $Server = $Job->transfer() ) {
5 # we transfer the data to $Server
6 ...........
This code creates new WSP Object to get all information about the
system. Line 2 gets the first job from the accepted directory and
executes the schedule procedure requested from the user. The Job Object
knows what schedule procedures exist and how to invoke them. Furthermore,
it knows to which server the data should be transferred, opens the
corresponding server object, and uses the transfer function defined
to the appropriate server. The information of the scheduling process
is kept in the "scheduled" Directory to permit a control
process to understand the state of the job.
Everything is now finished for the sending server. However, the
site is still not in production. The same framework is now installed
on the receiving server. The receiving server could forward the
data on another server, or keep it and work as a staging server.
Look at the "final" receiving server that updates the
Web site:
1 my $WSP = new WSP();
2 while ( my $Server = $WSP->sourceServers()) {
3 $Data = $WSP->transferData();
4 $Instructions = $WSP->transferInstructions();
5 if ( $Instructions->execute($Data) ) {
6 # eventually we could execute Control Functions
7 # or Security Controls here
8 . . . we're done !!!
It looks very simple because we reused what we had constructed for
the sending server. The WSP object knows which servers could have
sent data and how to get the data. This could be a simple opening
of a directory containing the data; however, the data could be on
a staging server or on the sending server (e.g., if the administrator
of the sending server has decided not to transfer the data and to
let the receiving server get the data instead). After data is transferred,
it will be installed using the instructions from the sending server.
The instruction object is created again, or transferred, or only reading
a file just transferred. The instructions are then executed using
the data objects previously created.
Current Development
The framework actually is installed and working as described above.
Two important features are now going to be installed: security and
control. All the objects described in this article will have two
additional methods: security() and notification().
These methods are called at certain times of the life of an object
-- when the object is created and when an important action is
executed on behalf of the objects (e.g., when the job is transferred
or a new job is created). This will permit callbacks to be executed
in certain situations and a well-defined context.
When the job object is created, security is called with the job
object and the reason as arguments. You can define a callback procedure
that is called using these two arguments after this notification
is called, again with the two parameters as above. You can register
a notification callback that could, for example, send an email to
the user who made the request. If the user contests the request,
you can invoke a method on the job to cancel the job. The job cancel
event can launch a notification to the administrator for permission
for the action. Updates of these new features will be uploaded to
my home page (http://voglmaier.getmyhomepage.com/).
Conclusion
This article described a framework to take Web sites from a test
environment to the production site. It is not limited to this particular
situation, but can be used for a wide range of requirements. Furthermore,
it does not impose a determined server architecture and it could
even be used if the server is based on a RDBMS (such as Oracle's
WebPortal). Although you have to export and to re-import the database
to transport data from one site to another, this works and you maintain
only one framework. The framework can fit all security requirements
because you determine which security features you will implement
in the system.
References
Berkeley DB -- http://www.sleepycat.com
GDBM -- http://www.fsf.org/software
Lincoln Stein CGI -- http://stein.cshl.org/WWW/CGI
Reinhard E. Voglmaier HomePage --
http://voglmaier.getmyhomepage.com/
Samba -- http://www.samba.org
Sys Admin magazine, "Simplifying Web Production"
by Reinhard Voglmaier, May 2000 -- http://www.sysadminmag.com/documents/sam0005h/
WebDav -- http://www.webdav.org
Reinhard Voglmaier studied physics at the University of Munich
in Germany and graduated from Max Planck Institute for Astrophysics
and Extraterrestrial Physics in Munich. After working in the IT
department at the German University of the Army in the field of
computer architecture, he was employed as a Specialist for Automation
in Honeywell and then as a UNIX Systems Specialist for performance
questions in database/network installations in Siemens Nixdorf.
Currently, he is the Internet and Intranet Manager at GlaxoWellcome,
Italy. He can be reached at: rv33100@GlaxoWellcome.co.uk.
|