Article

Web Site Management by Mail

Luca Salvadori

Managing a Web site can be easy, provided you have sufficient bandwidth to have a comfortable telnet shell working on your desktop. This is great when you work at the office and your Web server is on the same LAN, but becomes a nightmare when you're connected through phone lines and must cross several unknown network paths. I figured that email response time would normally be adequate for this purpose, and decided to use email to feed a simple mail message parser that is able to transform raw HTML-formatted text into full-fledged online documents. This article describes the basic implementation of the system through a set of shell scripts that are easily extensible to meet additional requirements.

What I Needed to Do

I wanted to keep the functionality simple: creating html files and putting them in a selected directory. Although the objective seems straightforward, the speed of the connection can present various logistical difficulties. Thus, I decided to prepare a raw HTML document (i.e., only the body, excluding any header, footer, style, etc.) and send it by email to an email alias, named mail2html (Listing 1), corresponding to a pipe to a parser coded in shell. Then, the challenge was to instruct the parser to do something variable based on instructions included at the top of email message.

The Basic Requirement

My Web site is bi-lingual, offering two parallel threads, one in English and the other in my national language, Italian. This means that two different directory trees exist, to be managed separately with their own styles, headers, and related issues. The parser should therefore be able to act differently depending on the selected language, build the target file accordingly, and put it in the correct place. Also, it is necessary to ensure that only messages from trusted users are processed. Finally, a log is necessary to verify that the system is working, what it did, and when.

The Parser

The parser was the most challenging, due to the various controls and string processing. The problem was eased by use of regular expressions, and simple UNIX commands such as cut, sed, ed, etc. The text manipulation done with ed scripts could have been implemented in awk or Perl, but I chose ed because of my greater familiarity with that tool and time constraints.

To make message processing with ed scripts feasible, a syntax for incoming messages was defined. Messages have two sections: a parameters section and an html body, separated by one or more blank lines. The parameter section is bounded by BEGIN_PARAMS and END_PARAMS tags at the beginning of the line. Comments in the usual shell syntax (preceded by #) may also be included. Inside the parameters section, various tags may be included, one per line. I included tags for compiling <HEADERS> and <TITLE> html tag fields (included in html file headers), output filename, and language used for processing. Adding other tags is only matter of changing some lines in the code. Tag syntax is quite straightforward: TAG=value<EOL>. The following is an example of a typical parameter sections:

BEGIN_PARAMS
T=This is an example of title
H=Heading of my example
F=/en/example/index.htm
L=en
END_PARAMS

The body of the message must be html-formatted, since no processing is done on it.

Implementation

The code, shown in Listing 1, starts with variable initialization and directory changes. Then, standard input (i.e., the email message coming through a pipe) is captured in a temporary file. The email message is composed of three sections, not two, as required by the defined syntax. The intervening mailer has added email headers at the top of the file. Mail headers are useful to identify the sender of the message, which is checked against a trusted user. If the sender does not match, a record is added to error file, and mail is sent to the trusted user to warn him or her of the potential intrusion.

No damage could be applied to the main html tree, since any file is created in an unprivileged user's home directory (defined by $ME variable) under the public_html directory (default for http://host/~user URLs). A word about this directory: because the final aim is to publish documents in the main tree, and because I had no time to implement smart algorithms (see To Do's in script), a mirror image of the main tree directories must be present under a dummy user's tree. This ensures that no strange manipulation can be applied, because files can be saved only in existing directories. At the end (and after a thorough check performed by the pages' author), a cron job can copy the new tree over the main one. As a security precaution, the dummy user is disabled by an * in the password field of the /etc/passwd file. The mail2html script is owned by the dummy user with a umask of 750 (-rwxr-x---). This is important because the trusted user is coded in clear, and nobody outside must be able to read the file (a real hacker could patch the mail job while it's in the queue for sending).

Finally, I suggest putting a .htpasswd file to further restrict access to dummy user's public_html directory (see httpd documentation for details). Together these precautions should avoid any misuse of the program. After checking the new Web pages with a standard html browser (Mosaic, Netscape or whatever), you can move the documents to the final online location using the issue script included in the package. Another pre-publishing task, changing the date, can be performed by the setdate script (see Listing 2 and Listing 3).

The message is parsed three times to extract the three sections into separate files. The first file contains mail headers, and the "From:" field is checked against the trusted users file. If the user is OK, the job continues with parameter section parsing; otherwise an error message is generated and sent to the trusted user by email, as well as written in the error log file.

Finally, parsing of the HTML page text starts. This is accomplished through two small shell scripts, build and setdate. Both are quite simple. build accepts a rough HTML file as input and encapsulates it with a standard header and footer in the selected language. It also calls setvar to initialize local variables (See Listing 4 and Listing 5.). setdate sets the date in the relevant field of the file, again in the style of the selected language. The work is done by a series of sed statements that are strictly related to header and footer structure. The behavior of the scripts is determined by the parsed parameters.

After building the file, you must insert the values you selected for parameters. Once more, an ed "here" script easily does the job, by changing TITLE and HEADING keywords with values supplied in the parameters section. Finally, the file is moved to its destination, as defined in the parameters, and a success entry is added to the logfile. Scratch files are cleaned up before exiting.

Setup

Setup of mail2html is quite simple and can follow the steps shown below, or be modified based on the site's configuration. Just be careful about security-related tasks.

1) Identify dummy user (for instance, guest). Create it if necessary, and disable it immediately by putting a * in /etc/passwd's password field.

2) Copy mail2html and other support scripts to an appropriate directory, under dummy user's home.

3) Change ownership of the above files to the dummy user, with mode 700 (rwx------).

4) Edit the mail aliases file (usually /usr/lib/aliases) and add the following entry:

mail2html: "| /dummy_user_home/dir/mail2html"

5) Run newaliases (or a corresponding utility on your system) to update the aliases database.

6) Change to dummy user's home and create public_html subdirectory

7) Create, starting at public_html subdir, a mirror of the directory tree ~httpd/htdocs (or whatever is the base html tree).

8) Create, in public_html subdir, a .htaccess file containing access information for the directory and its subtree. The file must be world readable. Here is an example:

AuthName this item
AuthType Basic
AuthUserFile /usr/guest/public_html/info/robonly/.htpasswd

<Limit GET>
order deny,allow
allow from all
deny from ncsa.uiuc.edu
require valid-user
</Limit>

9) Create .htpasswd file (as indicated in AuthUserFile tag of .htaccess file) containing username and CRYPTED password of users allowed to access it. It is best to use the ~httpd/support/htpasswd utility, but an edited /etc/passwd line may be used. Again, the file must be world readable.

10) Edit the mail2html and other support scripts to customize site variables (path, trusted user, etc.).

11) Create log- and error files by touching them.

12) You're done. Try creating an html file, put a parameter section on top, and send it by mail to mail2html. You should find a "production" html document exactly where you specified it should be (with F=filename tag), under dummy user's public_html subtree. You may watch it with a standard html browser.

Troubleshooting

If you follow the above setup strategy, no problems should occur. Localizing the scripts may introduce errors, however, often with the symptom of nothing happening. Specifically, in the event that something goes wrong, no output file will be generated. In troubleshooting, you can simulate the mail pipe by feeding the script directly from a shell account. Do this by sending the original mail message to yourself and saving it in a scratch file. Then process the saved email with the following command and watch the results:

cat mailfile | mail2html

If that technique does not show the source of the problem, try turning on shell debugging with:

cat mailfile | sh -x mail2html

This will show the commands as they are executed and help solve the problem.

Conclusion

This set of scripts provides a simple solution to the problem I faced due to low-speed connectivity. With a minimal amount of setup time, I can "telecommand" a remote Web site in a reasonable manner. The functionality of the scripts and the appearance of the HTML pages produced can be modified by simply changing the support scripts and the standard headers and footers. The scripts and the configuration procedure described here provide rudimentary security using the concept of "hide and protect." Feel free to embellish the code, but use it at your own risk.

About the Author

Luca Salvadori is head of the Information Systems Dept. at LABEN S.p.A., a leading European Space Electronics supplier. His experience spans from PDP-11 (whose manuals taught him English while at University) to Linux through Windows, PCs, and networking. He manages the company's web site (http://www.laben.it/) as well as (in his free time with a bunch of friends) another Linux box (http://aeroweb.lucia.it/) dedicated to aviation, his real, irresistible love. As a hobby, he flies sport aerobatics. Luca can be reached at: lsalvadori@batman.laben.it.