I work for MHA Engenharia Ltda, a consulting engineering services
firm based in São Paulo, Brazil. We encountered the problem
of project-related email messages arriving at a single manager'
or director's email account, rather than being shared with
all members of the project team. We wanted a way to make project-related
email available to all team members so that each engineer would
have the latest information. We also wanted a more effective method
for disseminating information to the team.
Our first solution was to create an email account for each project,
ask our clients to send messages to those project accounts, and
teach our engineers to regularly check for new messages at those
accounts. This solution was problematic, however, because we checked
email through a Eudora email client program and had to register
each message (to conform with our ISO9000 proceedings) and move
any files attached to the message to a specially created directory
(related to the registered message). Nobody wanted the task of checking
and registering new messages. We also had problems with more than
one user (we have more than 50 engineers) trying to check the same
email account simultaneously. Furthermore, there was the risk of
someone accidentally deleting important messages.
We developed a solution that automatically receives and registers
each project email message, converts the messages to HTML format,
posts the message on a Web server, and then delivers an acknowledgement
to the sender verifying that the message was received. The engineers
only have to visit the Web site to view any new messages. The solution
is based on five open-source software products and one proprietary
product.
The internal server receives the message from the Internet
mail server and then:
1. Registers the message into a unique internal number.
2. Transforms the message into an HTML page.
3. Extracts the attached files into a specially created directory.
4. Checks the extracted files for viruses and makes a report of
this task.
5. Updates the messages index. There are two indexes: one by date
and the other by thread.
6. Sends confirmation message to the sender. This message shows
the original message and its internal registered number.
7. The search program regularly updates its database with the
newer messages.
The engineers read the messages through an Internet Web browser.
The HTML pages generated from the email messages can be accessed
by the index ordered by receiving date, or by the related message
thread. The engineers can search messages from all projects or messages
from a specific project. Figure 1 shows a project message received
and converted by this system. The top of the message shows the project
id and name, the message id (our internal register), and message
subject. The receiving date and time, a navigation menu, and again
the subject and the message header are shown beneath. The message
body is followed by links to the attached message files.
The attached message files are expanded and checked for viruses.
To see the result of this test, the engineer removes the extension
.html from the navigation bar, then the browser will launch
the scan report. Figure 2 shows the scan report for the message
from Figure 1.
At the center of our solution is a program called MHonArc (http://www.mhonarc.org/).
MHonArc is a PERL (http://www.perl.org/) program that transforms
the messages into HTML pages and maintains an index. After playing
with MHonArc-generated pages, my partner Guilherme Augusto de Brito
Neves realized that we could use the MHonArc message numbers as
our internal message register required by MHA's ISO9000 proceedings.
This helped lead us to our final solution.
The MHonArc Program
MHonArc installation is simple and well documented (http://www.mhonarc.org/MHonArc/doc/install.html).
If the required PERL version is previously installed, there is no
problem installing MHonArc. (The system described in this article
uses MHonArc 2.4.7 and Perl 5.006.)
MHonArc can convert a single message, mail folders, or UUCP- or
UNIX-style mailboxes to HTML. The simplest way do run it is:
mhonarc path_to_inbox_file
This will convert all messages in the inbox file into HTML pages.
The above command generates two index pages and one file for each
message. For example, for a three-message inbox file, the command
above generates the files:
maillist.html -- Index page
threads.html -- Thread index page
msg00000.html -- First converted message
msg00001html -- Second converted message
msg00002html -- Third converted message
To add new messages to an already existing MHonArc HTML message,
is necessary to use the -add option:
mhonarc -add path_to_inbox_file
If the file to be processed is not specified, MHonArc will expect
to read it from the standard input.
Configuration
It is possible to configure almost all aspects of the MHonArc
program, and there are many options. MHonArc has a complete manual
set and good examples that show what can be done. MHonArc works
with resources that control its behavior (http://www.mhonarc.org/MHonArc/doc/resources.html).
I used resource files to set those resources, but some of them can
also be set via command-line options or environment variables. There
are also Resource Variables that can be used to represent dynamic
data within a resource (http://www.mhonarc.org/MHonArc/doc/rcvars.html).
The command I use to convert the messages transferred from our
Internet mail server is:
/usr/bin/mhonarc3 -add -quiet -rcfile $MRC/files.mrc -rcfile \
$MRC/indice.mrc -rcfile $MRC/date_p.mrc -outdir $D_OUT $F_IN
mhonarc3 is my hacked version; option -add is used to
add messages to the existing archive specified by -outdir.
I used three resource files, specified by the -rcfile option,
to control MHonArc behavior. The environment variables $MRC,
$D_OUT, and $F_IN are set by my scripts and specify
the resource files and archive (outdir) locations and the input file.
$F_IN is set to "" to read the messages from the standard
input, and -quiet is used for "do not produce status"
messages.
I started by modifying one of the example resource files that
produced a behavior similar to what I needed. (The example's
location depends on the installation and can be set at that time.)
It was date.rc, a resource file that makes MHonArc produce
an index page ordering and grouping messages by the date received.
Figure 3 shows a date index page of the received messages for
a given project, "Índice Cronológico das Mensagens
de E-MAIL". The page title shows the project ID and name, the
word "Recebidas", and the total count of the received
messages, starting with zero. There is a link to the thread pages,
"Índice De Assuntos", and navigation links because
each index page was set to show only 100 messages. The messages
are sorted in reverse order to show newest messages first.
The HTML page displays the message receiving time, our internal
message id, the message subject, and the sender. This line is controlled
by the LiTemplate resource. This resource is an HTML fragment
that will be used to create the index page. Within LiTemplate,
I use the same resource variables such as $MSGNUM$, $SUBJECT$,
and $FROMNAME$. This is the actual code used for LiTemplate:
<LiTemplate>
<tr valign=top>
<td><LI>$MSGLOCALDATE(CUR;%H:%M)$ [$MSGNUM$]</LI></td>
<td><b>$SUBJECT$</b></td>
<td>$FROMNAME:35$</td>
<td>$NOTE$</td>
</tr>
</LiTemplate>
Within a resource file, resources that have values is set like:
<ResourceName>value</ResourceName>
Resource variables can have arguments, and I set the $MSGLOCALDATE$
arguments to show only the time (%H:%M) of the current (CUR)
message in the example above:
$MSGLOCALDATE(CUR;%H:%M)$
You can also control the maximum expanded size of the resource variable.
I limit the sender's name size to 35 characters:
$FROMNAME:35$
The $ENV$ can be used to retrieve and expand the value of any
environment variable; I used it to get the project's id and name
set by my scripts into the E_USR variable:
$ENV(E_USR)$
The pages generated can be mounted brick-by-brick with HTML fragments
stored in resources. Those fragments can have dynamic information
like the LiTemplate above. Each type of page generated (index,
thread index, or message) has a well-defined layout. Based on that
layout, you can easily construct custom pages. The Message Page layout
(http://www.mhonarc.org/MHonArc/doc/layout.html#msgpg) is shown
here:
MSGPGBEGIN
MSGHEAD
TOPLINKS
SUBJECTHEADER
Converted message header
HEADBODYSEP
Converted message body
MSGBODYEND
(FOLUPBEGIN
FOLUPLITXT+
FOLUPEND)?
(REFSBEGIN
REFSLITXT+
REFSEND)?
BOTLINKS
MSGFOOT
MSGPGEND
The MSGPGBEGIN (beginning of the message page) resource used
to generate the HTML message showed in Figure 1 is:
<MSGPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML//EN">
<html><head>
<title>$ENV(E_BOX)$ $MSGNUM$ - $ENV(E_USR)$ - $SUBJECTNA:72$</title>
<LINK REV="made" HREF="mailto:$FROMADDR$">
</head>
<body BGCOLOR="#FFFFFF" TEXT="#000000" background="/images/fundo2.jpg">
<table border="0" width="100%">
<tr>
<td align="center" valign="midle"><p align="center">
<font size="6" face="Times New Roman">$ENV(E_USR)$<BR>
<img src="$ENV(E_ICON_BOX)$" alt="$ENV(E_BOX)$"s \
align="center" hspace="0">
*** Mensagem $ENV(E_BOX)$ <B>$MSGNUM$</B> ***
<img src="$ENV(E_ICON_BOX)$" alt="$ENV(E_BOX)$"s \
align="center" hspace="0">
<BR>$SUBJECTNA:72$</font>
</td>
<td align="right"><img src="/images/mha1.gif" alt="MHA \
Engenharia Ltda" align="right" hspace="0" WIDTH="221" \
HEIGHT="117"></font></td>
</tr>
</table>
<p align="center"><font size="6" face="Times New Roman"></font></p>
<CENTER>
Mensagem $ENV(E_BOX)$ em<B>$MSGLOCALDATE(CUR; %d/%m/%Y as %H:%M hs)$</B>.
</CENTER>
</MSGPGBEGIN>
My script set diverse environment variables like E_BOX, E_USR,
and E_ICON_BOX. There are also resources variables like $MSGNUM$
and $SUBJECTNA$ limited to 72 characters. The TOPLINKS
resource is:
<TopLinks>
<HR>
<center>
$BUTTON(PREV)$$BUTTON(NEXT)$$BUTTON(TPREV)$$BUTTON(TNEXT)$[<A
HREF="$IDXFNAME$#$MSGNUM$">$IDXLABEL$</A>][<A
HREF="$TIDXFNAME$#$MSGNUM$">$TIDXLABEL$</A>]
</center>
</TopLinks>
The resource that specifies what will be expanded by $BUTTON(PREV)$
is set by PrevButton as below:
<PrevButton chop>
[<A HREF="$MSG(PREV)$">Prévia - Data</A>]
</PrevButton>
This way, I was able to build the desired layout. I started by designing
a single HTML page, and when this page was what we wanted, I broke
it down into the respective resources.
MIME Support
The MHonArc MIME support is provided by MIME filters. Those filters
are functions that have parameters allowing us to configure it.
It is possible to write a new filter and to incorporate it into
MHonArc. I configure the MIME filters to create a subdirectory for
every message that has files attached to it, and to place a link
and icon to this file in the generated HTML message. To do that,
I set the resource MIMEArgs to:
<MIMEArgs>
m2h_external::filter; usename subdir useicon
m2h_plain_text::filter; attachcheck htmlcheck usename subdir \
useicon maxwidth=80
</MIMEArgs>
Also for plain text messages that are expanded into the HTML page
as a <PRE> tag, I limit the line length to 80 characters
and lines are wrapped if they are longer. For the expanded attached
files, I set an image icon for each file type and one default for
the files that could not match the set types or don't have the
correct MIME type set in the message. Again, I use one resource to
do that:
<Icons>
audio/basic:/mha_mail/icons/son.gif
image/jpeg:/mha_mail/icons/jpg.gif
application/zip:/mha_mail/icons/zip.gif
application/x-gzip:/mha_mail/icons/zip.gif
application/msword:/mha_mail/icons/doc.gif
. . . (skipped)
</Icons>
Hacking MHonArc
We wanted to notify our clients (send back a received confirmation
message) that we have received and registered each message they
send. I also wanted the attached message files, automatically expanded
by MHonArc, to be scanned by anti-virus software. Because MHonArc
does not have these abilities, I hacked it to make them by using
my very little Perl experience.
Sending back a confirmation message meant that the system should
confirm that the received message was not automatically sent, like
errors messages (http://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.html#newfields).
Thus, the system needs to look for the existence of an Auto-Submitted
message header in the received message. MHonArc did not do it, and
again, I needed to hack it. (Unfortunately, not every system or
program uses the Auto-Submitted header in its auto-generated
messages.) Figure 4 shows the received confirmation message for
the Figure 1 message.
My hack to MHonArc consisted mainly of two changes. I first changed
the main program (MHonArc) to call a post-processing function. For
each processed message, this function calls the anti-virus program
and, if the message doesn't have the Auto-Submitted
header, it prepares and sends back the confirmation message. Listing
1 contains the hacked MHonArc and my post-processing function. I
then created an internal MHonArc variable to store the existence
of the Auto-Submitted header. Listing 2 contains those changes.
(Search the files for "ROBERTO" to see all of the changes.)
OpenSSH to Transfer the Messages
To add messages as they arrive, I used SendMail's (http://www.sendmail.org/)
.forward approach described at the MHonArc FAQ (http://www.mhonarc.org/MHonArc/doc/faq/archives.html#forward).
At our Internet mail server, the .forward file is configured
to store the newly received message in the local mail folder for
each project account and to pass a copy of it to the script that
handles the necessary steps to transfer the message to our internal
server to be processed by MHonArc. Below is how the .forward
file looks:
\prj00777, "|./p_m.scr "prj00777"
\prj00777 is used to make Sendmail store the message in the
local mail folder; ./p_m.scr sets the project-specific values
and calls another script to perform the transfer. The p_m.scr
script:
#!/bin/ksh
E_USR="Projeto: 00777 - ROCHAVERÁ - TSM";export E_USR
E_NAME=prj00777;export E_NAME
../p_mail.scr
I use the Korn shell. E_USR is set with the project's
name (see the resource MSGPGBEGIN above). E_NAME is
set with the account name that is the same name used for the archive's
location (see the D_OUT variable used in the mhonarc3
command line shown above). The main command in the p_mail.scr
is the one that transfers all the data to our internal server:
ssh -l username ???.???.???.??? "E_USR=\"$E_USR\";export E_USR; \
E_NAME=$E_NAME;export E_NAME;./mrc/m2h_in.scr"
It calls OpenSSH and asks to connect to our internal server (address:
???.???.???.???) as a specific user (username that owns
all converted message files and directories) and to execute the command
within the quotation marks. This command, running at the internal
server, sets the same E_USR and E_NAME environment variables
set by the ./p_m.scr script and calls m2h_in.scr. This
will set some others environment variables and call mhonarc3,
as shown later. The message received is magically passed through all
the scripts until mhonarc3 reads it from its standard input.
Because I was new to OpenSSH (http://www.openssh.com/),
I referred to Matt Lesko's article "Installing and Configuring
OpenSSH" (Sys Admin magazine, October 2000: (http://www.samag.com/documents/s=1160/sam0010a/0010a.htm),
which helped me get started. After reading the ssh protocol
specification (http://www.ietf.org/html.charters/secsh-charter.html)
and OpenSSH manuals, I felt comfortable enough to do the job.
I decided to use only the protocol version 2, check host IP, Strict
Host Key Checking, do not forward and do not fall back to rsh.
Obviously, I have to correctly generate server keys and configure
each server with each other public key.
After the ssh command, the script checks for any returned
errors. If there is an error, the script transfers the message to
a temporary file to process later. If there was no error, the script
checks for the existence of previously unprocessed messages stored
in the temporary file. If that file exists, it will transfer its
contents (just like the command above) to the internal server. If
there was no error in the transfer, delete the temporary file; otherwise,
the file will remain. This approach works well. If the internal
server cannot be reached, for example, the message will be processed
when the next correctly transferred message occurs.
The Search Engine
To allow our engineers to search through the converted messages,
we use ht://Dig (http://www.htdig.org/), an HTML search engine
that allows extensive customization. It uses ispell (http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html)
for its dictionary, so I just had to download and install the Brazilian
Portuguese ispell dictionary and afix files. There is one search
page that allows searching through messages from all the projects,
and one search page that limits the search to a specific project.
To limit the search, I use the htdig HTML form hide variable,
shown below:
name="restrict" value="/mha_mail/prj00777"
This narrows the search to the 00777 project messages. (See Figure
5, a search result for date 21/09/2001 and the word "reunião",
limited to the project 00777.) Cron is used to update ht://Dig's
database every hour.
The Anti-Virus Program
We used the anti-virus program VirusSCAN for UNIX (uvscan) from
McAfee (http://www.mcafee.com/, which is not open source.
I could also use AMaViS (http://www.amavis.org/) to scan
messages at the mail server but I decided to scan at the internal
server to keep the scan's report with the original message.
Figure 2 shows the scan report of a message.
Conclusion
Our problem was solved. Our engineers are now used to reading
the project mail messages through an Internet Web browser. Information
is easily disseminated to the whole group. Clients know that their
messages are received and their information goes to everyone on
the project. (See Figure 6.)
The open-source MHonArc, Perl, ht://Dig (and ispell), OpenSSH,
and Sendmail were used together, in addition to the proprietary
uvscan, to build a single solution. All of the programs have good
documentation and allow extensive customization. The beauty of open-source
programs is that if something can't be done by normal customization,
the source code is available to help create a solution. I don't
think we could solve the granularity mail problem we were having
without the open-source options.
References
AMaViS -- http://www.amavis.org/
ht://Dig -- http://www.htdig.org/
ispell -- http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html
McAfee -- http://www.mcafee.com/
MhonArc -- http://www.mhonarc.org/
OpenSSH -- http://www.openssh.com/
PERL -- http://www.perl.org/
SendMail's -- http://www.sendmail.org/
Sys Admin magazine -- http://www.sysadminmag.com/documents/s=1160/sam0010a/0010a.htm
Roberto João Lopes Garcia is a civil engineer, specializing
in software engineering. He has worked with Solaris for more than
ten years and now also works with Linux. Roberto has developed FORTRAN,
C, C++, LISP, and SQL programs for calculations, CAD (Computer Aided
Design) drawings, server daemons, dynamic Web pages, etc. He can
be contacted at: roberto@mha.com.br.