Web Site Management by Mail
Luca Salvadori
Managing a Web site can be easy, provided you have sufficient
bandwidth
to have a comfortable telnet shell working on your desktop.
This is
great when you work at the office and your Web server
is on the same
LAN, but becomes a nightmare when you're connected through
phone lines
and must cross several unknown network paths. I figured
that email
response time would normally be adequate for this purpose,
and decided
to use email to feed a simple mail message parser that
is able to
transform raw HTML-formatted text into full-fledged
online documents.
This article describes the basic implementation of the
system through a
set of shell scripts that are easily extensible to meet
additional
requirements.
What I Needed to Do
I wanted to keep the functionality simple: creating
html files and
putting them in a selected directory. Although the objective
seems
straightforward, the speed of the connection can present
various
logistical difficulties. Thus, I decided to prepare
a raw HTML document
(i.e., only the body, excluding any header, footer,
style, etc.) and
send it by email to an email alias, named mail2html
(Listing 1),
corresponding to a pipe to a parser coded in shell.
Then, the challenge
was to instruct the parser to do something variable
based on
instructions included at the top of email message.
The Basic Requirement
My Web site is bi-lingual, offering two parallel threads,
one in English
and the other in my national language, Italian. This
means that two
different directory trees exist, to be managed separately
with their own
styles, headers, and related issues. The parser should
therefore be able
to act differently depending on the selected language,
build the target
file accordingly, and put it in the correct place. Also,
it is necessary
to ensure that only messages from trusted users are
processed. Finally,
a log is necessary to verify that the system is working,
what it did,
and when.
The Parser
The parser was the most challenging, due to the various
controls and
string processing. The problem was eased by use of regular
expressions,
and simple UNIX commands such as cut, sed, ed, etc.
The text
manipulation done with ed scripts could have been implemented
in awk or
Perl, but I chose ed because of my greater familiarity
with that tool
and time constraints.
To make message processing with ed scripts feasible,
a syntax for
incoming messages was defined. Messages have two sections:
a parameters
section and an html body, separated by one or more blank
lines. The
parameter section is bounded by BEGIN_PARAMS and END_PARAMS
tags at the
beginning of the line. Comments in the usual shell syntax
(preceded by
#) may also be included. Inside the parameters section,
various tags may
be included, one per line. I included tags for compiling
<HEADERS> and
<TITLE> html tag fields (included in html file
headers), output
filename, and language used for processing. Adding other
tags is only
matter of changing some lines in the code. Tag syntax
is quite
straightforward: TAG=value<EOL>. The following
is an example of a
typical parameter sections:
BEGIN_PARAMS
T=This is an example of title
H=Heading of my example
F=/en/example/index.htm
L=en
END_PARAMS
The body of the message must be html-formatted, since
no processing is
done on it.
Implementation
The code, shown in Listing 1, starts with variable initialization
and
directory changes. Then, standard input (i.e., the
email message coming
through a pipe) is captured in a temporary file. The
email message is
composed of three sections, not two, as required by
the defined syntax.
The intervening mailer has added email headers at the
top of the file.
Mail headers are useful to identify the sender of the
message, which is
checked against a trusted user. If the sender does not
match, a record
is added to error file, and mail is sent to the trusted
user to warn him
or her of the potential intrusion.
No damage could be applied to the main html tree, since
any file is
created in an unprivileged user's home directory (defined
by $ME
variable) under the public_html directory (default for
http://host/~user
URLs). A word about this directory: because the final
aim is to publish
documents in the main tree, and because I had no time
to implement smart
algorithms (see To Do's in script), a mirror image of
the main tree
directories must be present under a dummy user's tree.
This ensures that
no strange manipulation can be applied, because files
can be saved only
in existing directories. At the end (and after a thorough
check
performed by the pages' author), a cron job can copy
the new tree over
the main one. As a security precaution, the dummy user
is disabled by an
* in the password field of the /etc/passwd file. The
mail2html script is
owned by the dummy user with a umask of 750 (-rwxr-x---).
This is
important because the trusted user is coded in clear,
and nobody outside
must be able to read the file (a real hacker could patch
the mail job
while it's in the queue for sending).
Finally, I suggest putting a .htpasswd file to further
restrict access
to dummy user's public_html directory (see httpd documentation
for
details). Together these precautions should avoid any
misuse of the
program. After checking the new Web pages with a standard
html browser
(Mosaic, Netscape or whatever), you can move the documents
to the final
online location using the issue script included in the
package. Another
pre-publishing task, changing the date, can be performed
by the setdate
script (see Listing 2 and Listing 3).
The message is parsed three times to extract the three
sections into
separate files. The first file contains mail headers,
and the "From:"
field is checked against the trusted users file. If
the user is OK, the
job continues with parameter section parsing; otherwise
an error message
is generated and sent to the trusted user by email,
as well as written
in the error log file.
Finally, parsing of the HTML page text starts. This
is accomplished
through two small shell scripts, build and setdate.
Both are quite
simple. build accepts a rough HTML file as input and
encapsulates it
with a standard header and footer in the selected language.
It also
calls setvar to initialize local variables (See Listing
4 and Listing 5.).
setdate sets the date in the relevant field of the file,
again in the
style of the selected language. The work is done by
a series of sed
statements that are strictly related to header and footer
structure. The
behavior of the scripts is determined by the parsed
parameters.
After building the file, you must insert the values
you selected for
parameters. Once more, an ed "here" script
easily does the job, by
changing TITLE and HEADING keywords with values supplied
in the
parameters section. Finally, the file is moved to its
destination, as
defined in the parameters, and a success entry is added
to the logfile.
Scratch files are cleaned up before exiting.
Setup
Setup of mail2html is quite simple and can follow the
steps shown below,
or be modified based on the site's configuration. Just
be careful about
security-related tasks.
1) Identify dummy user (for instance, guest). Create
it if necessary,
and disable it immediately by putting a * in /etc/passwd's
password
field.
2) Copy mail2html and other support scripts to an appropriate
directory, under dummy user's home.
3) Change ownership of the above files to the dummy
user, with mode 700
(rwx------).
4) Edit the mail aliases file (usually /usr/lib/aliases)
and add the
following entry:
mail2html: "| /dummy_user_home/dir/mail2html"
5) Run newaliases (or a corresponding utility on your
system) to update
the aliases database.
6) Change to dummy user's home and create public_html
subdirectory
7) Create, starting at public_html subdir, a mirror
of the directory
tree ~httpd/htdocs (or whatever is the base html tree).
8) Create, in public_html subdir, a .htaccess file containing
access
information for the directory and its subtree. The file
must be world
readable. Here is an example:
AuthName this item
AuthType Basic
AuthUserFile /usr/guest/public_html/info/robonly/.htpasswd
<Limit GET>
order deny,allow
allow from all
deny from ncsa.uiuc.edu
require valid-user
</Limit>
9) Create .htpasswd file (as indicated in AuthUserFile
tag of .htaccess
file) containing username and CRYPTED password of users
allowed to
access it. It is best to use the ~httpd/support/htpasswd
utility, but an
edited /etc/passwd line may be used. Again, the file
must be world
readable.
10) Edit the mail2html and other support scripts to
customize site
variables (path, trusted user, etc.).
11) Create log- and error files by touching them.
12) You're done. Try creating an html file, put a parameter
section on
top, and send it by mail to mail2html. You should find
a "production"
html document exactly where you specified it should
be (with F=filename
tag), under dummy user's public_html subtree. You may
watch it with a
standard html browser.
Troubleshooting
If you follow the above setup strategy, no problems
should occur.
Localizing the scripts may introduce errors, however,
often with the
symptom of nothing happening. Specifically, in the event
that something
goes wrong, no output file will be generated. In troubleshooting,
you
can simulate the mail pipe by feeding the script directly
from a shell
account. Do this by sending the original mail message
to yourself and
saving it in a scratch file. Then process the saved
email with the
following command and watch the results:
cat mailfile | mail2html
If that technique does not show the source of the problem,
try turning
on shell debugging with:
cat mailfile | sh -x mail2html
This will show the commands as they are executed and
help solve the
problem.
Conclusion
This set of scripts provides a simple solution to the
problem I faced
due to low-speed connectivity. With a minimal amount
of setup time, I
can "telecommand" a remote Web site in a reasonable
manner. The
functionality of the scripts and the appearance of the
HTML pages
produced can be modified by simply changing the support
scripts and the
standard headers and footers. The scripts and the configuration
procedure described here provide rudimentary security
using the concept
of "hide and protect." Feel free to embellish
the code, but use it at
your own risk.
About the Author
Luca Salvadori is head of the Information Systems Dept.
at LABEN S.p.A.,
a leading European Space Electronics supplier. His experience
spans from
PDP-11 (whose manuals taught him English while at University)
to Linux
through Windows, PCs, and networking. He manages the
company's web site
(http://www.laben.it/) as well as (in his free time
with a bunch of
friends) another Linux box (http://aeroweb.lucia.it/)
dedicated to
aviation, his real, irresistible love. As a hobby, he
flies sport
aerobatics. Luca can be reached at: lsalvadori@batman.laben.it.
|