Cover V11, I01

Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

jan2002.tar

Posting Email to the Web with MHonArc

Roberto João Lopes Garcia

I work for MHA Engenharia Ltda, a consulting engineering services firm based in São Paulo, Brazil. We encountered the problem of project-related email messages arriving at a single manager' or director's email account, rather than being shared with all members of the project team. We wanted a way to make project-related email available to all team members so that each engineer would have the latest information. We also wanted a more effective method for disseminating information to the team.

Our first solution was to create an email account for each project, ask our clients to send messages to those project accounts, and teach our engineers to regularly check for new messages at those accounts. This solution was problematic, however, because we checked email through a Eudora email client program and had to register each message (to conform with our ISO9000 proceedings) and move any files attached to the message to a specially created directory (related to the registered message). Nobody wanted the task of checking and registering new messages. We also had problems with more than one user (we have more than 50 engineers) trying to check the same email account simultaneously. Furthermore, there was the risk of someone accidentally deleting important messages.

We developed a solution that automatically receives and registers each project email message, converts the messages to HTML format, posts the message on a Web server, and then delivers an acknowledgement to the sender verifying that the message was received. The engineers only have to visit the Web site to view any new messages. The solution is based on five open-source software products and one proprietary product.

The process for this project email system is as follows:

  • The Internet email server receives a project message, copies it (as usual) to allow POP3 or IMAP to read it, and transfers the message to an internal server that will process it. If this cannot be completed, the system stores the message and retries later.

  • The internal server receives the message from the Internet mail server and then:

    1. Registers the message into a unique internal number.

    2. Transforms the message into an HTML page.

    3. Extracts the attached files into a specially created directory.

    4. Checks the extracted files for viruses and makes a report of this task.

    5. Updates the messages index. There are two indexes: one by date and the other by thread.

    6. Sends confirmation message to the sender. This message shows the original message and its internal registered number.

    7. The search program regularly updates its database with the newer messages.

    The engineers read the messages through an Internet Web browser. The HTML pages generated from the email messages can be accessed by the index ordered by receiving date, or by the related message thread. The engineers can search messages from all projects or messages from a specific project. Figure 1 shows a project message received and converted by this system. The top of the message shows the project id and name, the message id (our internal register), and message subject. The receiving date and time, a navigation menu, and again the subject and the message header are shown beneath. The message body is followed by links to the attached message files.

    The attached message files are expanded and checked for viruses. To see the result of this test, the engineer removes the extension .html from the navigation bar, then the browser will launch the scan report. Figure 2 shows the scan report for the message from Figure 1.

    At the center of our solution is a program called MHonArc (http://www.mhonarc.org/). MHonArc is a PERL (http://www.perl.org/) program that transforms the messages into HTML pages and maintains an index. After playing with MHonArc-generated pages, my partner Guilherme Augusto de Brito Neves realized that we could use the MHonArc message numbers as our internal message register required by MHA's ISO9000 proceedings. This helped lead us to our final solution.

    The MHonArc Program

    MHonArc installation is simple and well documented (http://www.mhonarc.org/MHonArc/doc/install.html). If the required PERL version is previously installed, there is no problem installing MHonArc. (The system described in this article uses MHonArc 2.4.7 and Perl 5.006.)

    MHonArc can convert a single message, mail folders, or UUCP- or UNIX-style mailboxes to HTML. The simplest way do run it is:

    mhonarc path_to_inbox_file
    
    This will convert all messages in the inbox file into HTML pages. The above command generates two index pages and one file for each message. For example, for a three-message inbox file, the command above generates the files:

    maillist.html -- Index page

    threads.html -- Thread index page

    msg00000.html -- First converted message

    msg00001html -- Second converted message

    msg00002html -- Third converted message

    To add new messages to an already existing MHonArc HTML message, is necessary to use the -add option:

    mhonarc -add path_to_inbox_file
    
    If the file to be processed is not specified, MHonArc will expect to read it from the standard input.

    Configuration

    It is possible to configure almost all aspects of the MHonArc program, and there are many options. MHonArc has a complete manual set and good examples that show what can be done. MHonArc works with resources that control its behavior (http://www.mhonarc.org/MHonArc/doc/resources.html). I used resource files to set those resources, but some of them can also be set via command-line options or environment variables. There are also Resource Variables that can be used to represent dynamic data within a resource (http://www.mhonarc.org/MHonArc/doc/rcvars.html).

    The command I use to convert the messages transferred from our Internet mail server is:

    /usr/bin/mhonarc3 -add -quiet -rcfile $MRC/files.mrc -rcfile \
      $MRC/indice.mrc -rcfile $MRC/date_p.mrc -outdir $D_OUT $F_IN
    
    mhonarc3 is my hacked version; option -add is used to add messages to the existing archive specified by -outdir. I used three resource files, specified by the -rcfile option, to control MHonArc behavior. The environment variables $MRC, $D_OUT, and $F_IN are set by my scripts and specify the resource files and archive (outdir) locations and the input file. $F_IN is set to "" to read the messages from the standard input, and -quiet is used for "do not produce status" messages.

    I started by modifying one of the example resource files that produced a behavior similar to what I needed. (The example's location depends on the installation and can be set at that time.) It was date.rc, a resource file that makes MHonArc produce an index page ordering and grouping messages by the date received.

    Figure 3 shows a date index page of the received messages for a given project, "Índice Cronológico das Mensagens de E-MAIL". The page title shows the project ID and name, the word "Recebidas", and the total count of the received messages, starting with zero. There is a link to the thread pages, "Índice De Assuntos", and navigation links because each index page was set to show only 100 messages. The messages are sorted in reverse order to show newest messages first.

    The HTML page displays the message receiving time, our internal message id, the message subject, and the sender. This line is controlled by the LiTemplate resource. This resource is an HTML fragment that will be used to create the index page. Within LiTemplate, I use the same resource variables such as $MSGNUM$, $SUBJECT$, and $FROMNAME$. This is the actual code used for LiTemplate:

    <LiTemplate>
    <tr valign=top>
    <td><LI>$MSGLOCALDATE(CUR;%H:%M)$ [$MSGNUM$]</LI></td>
    <td><b>$SUBJECT$</b></td>
    <td>$FROMNAME:35$</td>
    <td>$NOTE$</td>
    </tr>
    </LiTemplate>
    
    Within a resource file, resources that have values is set like:

    <ResourceName>value</ResourceName>
    
    Resource variables can have arguments, and I set the $MSGLOCALDATE$ arguments to show only the time (%H:%M) of the current (CUR) message in the example above:

    $MSGLOCALDATE(CUR;%H:%M)$
    
    You can also control the maximum expanded size of the resource variable. I limit the sender's name size to 35 characters:

    $FROMNAME:35$
    
    The $ENV$ can be used to retrieve and expand the value of any environment variable; I used it to get the project's id and name set by my scripts into the E_USR variable:

    $ENV(E_USR)$
    
    The pages generated can be mounted brick-by-brick with HTML fragments stored in resources. Those fragments can have dynamic information like the LiTemplate above. Each type of page generated (index, thread index, or message) has a well-defined layout. Based on that layout, you can easily construct custom pages. The Message Page layout (http://www.mhonarc.org/MHonArc/doc/layout.html#msgpg) is shown here:

    MSGPGBEGIN
        MSGHEAD
        TOPLINKS
        SUBJECTHEADER
        Converted message header
        HEADBODYSEP
        Converted message body
        MSGBODYEND
        (FOLUPBEGIN
            FOLUPLITXT+
         FOLUPEND)?
        (REFSBEGIN
            REFSLITXT+
         REFSEND)?
        BOTLINKS
        MSGFOOT
    MSGPGEND
    
    The MSGPGBEGIN (beginning of the message page) resource used to generate the HTML message showed in Figure 1 is:

    <MSGPGBEGIN>
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML//EN">
    <html><head>
    <title>$ENV(E_BOX)$ $MSGNUM$ - $ENV(E_USR)$ - $SUBJECTNA:72$</title>
    <LINK REV="made" HREF="mailto:$FROMADDR$">
    </head>
    <body BGCOLOR="#FFFFFF" TEXT="#000000" background="/images/fundo2.jpg">
    
    <table border="0" width="100%">
      <tr>
        <td align="center" valign="midle"><p align="center">
           <font size="6" face="Times New Roman">$ENV(E_USR)$<BR>
           <img src="$ENV(E_ICON_BOX)$" alt="$ENV(E_BOX)$"s  \
              align="center" hspace="0">
           *** Mensagem $ENV(E_BOX)$ <B>$MSGNUM$</B> ***
           <img src="$ENV(E_ICON_BOX)$" alt="$ENV(E_BOX)$"s \
             align="center" hspace="0">
           <BR>$SUBJECTNA:72$</font>
        </td>
        <td align="right"><img src="/images/mha1.gif" alt="MHA \
          Engenharia Ltda" align="right" hspace="0" WIDTH="221" \
          HEIGHT="117"></font></td>
      </tr>
    </table>
    
    <p align="center"><font   size="6" face="Times New Roman"></font></p>
    <CENTER>
    Mensagem $ENV(E_BOX)$ em<B>$MSGLOCALDATE(CUR; %d/%m/%Y as %H:%M hs)$</B>.
    </CENTER>
    
    </MSGPGBEGIN>
    
    My script set diverse environment variables like E_BOX, E_USR, and E_ICON_BOX. There are also resources variables like $MSGNUM$ and $SUBJECTNA$ limited to 72 characters. The TOPLINKS resource is:

     <TopLinks>
    <HR>
    <center>
    $BUTTON(PREV)$$BUTTON(NEXT)$$BUTTON(TPREV)$$BUTTON(TNEXT)$[<A
    HREF="$IDXFNAME$#$MSGNUM$">$IDXLABEL$</A>][<A
    HREF="$TIDXFNAME$#$MSGNUM$">$TIDXLABEL$</A>]
    </center>
    </TopLinks>
    
    The resource that specifies what will be expanded by $BUTTON(PREV)$ is set by PrevButton as below:

    <PrevButton chop>
    [<A HREF="$MSG(PREV)$">Prévia - Data</A>]
    </PrevButton>
    
    This way, I was able to build the desired layout. I started by designing a single HTML page, and when this page was what we wanted, I broke it down into the respective resources.

    MIME Support

    The MHonArc MIME support is provided by MIME filters. Those filters are functions that have parameters allowing us to configure it. It is possible to write a new filter and to incorporate it into MHonArc. I configure the MIME filters to create a subdirectory for every message that has files attached to it, and to place a link and icon to this file in the generated HTML message. To do that, I set the resource MIMEArgs to:

    <MIMEArgs>
       m2h_external::filter;   usename   subdir   useicon
       m2h_plain_text::filter; attachcheck htmlcheck usename subdir \
          useicon maxwidth=80
    </MIMEArgs>
    
    Also for plain text messages that are expanded into the HTML page as a <PRE> tag, I limit the line length to 80 characters and lines are wrapped if they are longer. For the expanded attached files, I set an image icon for each file type and one default for the files that could not match the set types or don't have the correct MIME type set in the message. Again, I use one resource to do that:

    <Icons>
    audio/basic:/mha_mail/icons/son.gif
    image/jpeg:/mha_mail/icons/jpg.gif
    application/zip:/mha_mail/icons/zip.gif
    application/x-gzip:/mha_mail/icons/zip.gif
    application/msword:/mha_mail/icons/doc.gif
       . . . (skipped)
    </Icons>
    
    Hacking MHonArc

    We wanted to notify our clients (send back a received confirmation message) that we have received and registered each message they send. I also wanted the attached message files, automatically expanded by MHonArc, to be scanned by anti-virus software. Because MHonArc does not have these abilities, I hacked it to make them by using my very little Perl experience.

    Sending back a confirmation message meant that the system should confirm that the received message was not automatically sent, like errors messages (http://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.html#newfields). Thus, the system needs to look for the existence of an Auto-Submitted message header in the received message. MHonArc did not do it, and again, I needed to hack it. (Unfortunately, not every system or program uses the Auto-Submitted header in its auto-generated messages.) Figure 4 shows the received confirmation message for the Figure 1 message.

    My hack to MHonArc consisted mainly of two changes. I first changed the main program (MHonArc) to call a post-processing function. For each processed message, this function calls the anti-virus program and, if the message doesn't have the Auto-Submitted header, it prepares and sends back the confirmation message. Listing 1 contains the hacked MHonArc and my post-processing function. I then created an internal MHonArc variable to store the existence of the Auto-Submitted header. Listing 2 contains those changes. (Search the files for "ROBERTO" to see all of the changes.)

    OpenSSH to Transfer the Messages

    To add messages as they arrive, I used SendMail's (http://www.sendmail.org/) .forward approach described at the MHonArc FAQ (http://www.mhonarc.org/MHonArc/doc/faq/archives.html#forward). At our Internet mail server, the .forward file is configured to store the newly received message in the local mail folder for each project account and to pass a copy of it to the script that handles the necessary steps to transfer the message to our internal server to be processed by MHonArc. Below is how the .forward file looks:

    \prj00777, "|./p_m.scr "prj00777"
    
    \prj00777 is used to make Sendmail store the message in the local mail folder; ./p_m.scr sets the project-specific values and calls another script to perform the transfer. The p_m.scr script:

    #!/bin/ksh
    E_USR="Projeto: 00777 - ROCHAVERÁ - TSM";export E_USR
    E_NAME=prj00777;export E_NAME
    ../p_mail.scr
    
    I use the Korn shell. E_USR is set with the project's name (see the resource MSGPGBEGIN above). E_NAME is set with the account name that is the same name used for the archive's location (see the D_OUT variable used in the mhonarc3 command line shown above). The main command in the p_mail.scr is the one that transfers all the data to our internal server:

    ssh -l username ???.???.???.??? "E_USR=\"$E_USR\";export E_USR; \
    E_NAME=$E_NAME;export E_NAME;./mrc/m2h_in.scr"
    
    It calls OpenSSH and asks to connect to our internal server (address: ???.???.???.???) as a specific user (username that owns all converted message files and directories) and to execute the command within the quotation marks. This command, running at the internal server, sets the same E_USR and E_NAME environment variables set by the ./p_m.scr script and calls m2h_in.scr. This will set some others environment variables and call mhonarc3, as shown later. The message received is magically passed through all the scripts until mhonarc3 reads it from its standard input.

    Because I was new to OpenSSH (http://www.openssh.com/), I referred to Matt Lesko's article "Installing and Configuring OpenSSH" (Sys Admin magazine, October 2000: (http://www.samag.com/documents/s=1160/sam0010a/0010a.htm), which helped me get started. After reading the ssh protocol specification (http://www.ietf.org/html.charters/secsh-charter.html) and OpenSSH manuals, I felt comfortable enough to do the job.

    I decided to use only the protocol version 2, check host IP, Strict Host Key Checking, do not forward and do not fall back to rsh. Obviously, I have to correctly generate server keys and configure each server with each other public key.

    After the ssh command, the script checks for any returned errors. If there is an error, the script transfers the message to a temporary file to process later. If there was no error, the script checks for the existence of previously unprocessed messages stored in the temporary file. If that file exists, it will transfer its contents (just like the command above) to the internal server. If there was no error in the transfer, delete the temporary file; otherwise, the file will remain. This approach works well. If the internal server cannot be reached, for example, the message will be processed when the next correctly transferred message occurs.

    The Search Engine

    To allow our engineers to search through the converted messages, we use ht://Dig (http://www.htdig.org/), an HTML search engine that allows extensive customization. It uses ispell (http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html) for its dictionary, so I just had to download and install the Brazilian Portuguese ispell dictionary and afix files. There is one search page that allows searching through messages from all the projects, and one search page that limits the search to a specific project. To limit the search, I use the htdig HTML form hide variable, shown below:

    name="restrict" value="/mha_mail/prj00777"
    
    This narrows the search to the 00777 project messages. (See Figure 5, a search result for date 21/09/2001 and the word "reunião", limited to the project 00777.) Cron is used to update ht://Dig's database every hour.

    The Anti-Virus Program

    We used the anti-virus program VirusSCAN for UNIX (uvscan) from McAfee (http://www.mcafee.com/, which is not open source. I could also use AMaViS (http://www.amavis.org/) to scan messages at the mail server but I decided to scan at the internal server to keep the scan's report with the original message. Figure 2 shows the scan report of a message.

    Conclusion

    Our problem was solved. Our engineers are now used to reading the project mail messages through an Internet Web browser. Information is easily disseminated to the whole group. Clients know that their messages are received and their information goes to everyone on the project. (See Figure 6.)

    The open-source MHonArc, Perl, ht://Dig (and ispell), OpenSSH, and Sendmail were used together, in addition to the proprietary uvscan, to build a single solution. All of the programs have good documentation and allow extensive customization. The beauty of open-source programs is that if something can't be done by normal customization, the source code is available to help create a solution. I don't think we could solve the granularity mail problem we were having without the open-source options.

    References

    AMaViS -- http://www.amavis.org/

    ht://Dig -- http://www.htdig.org/

    ispell -- http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html

    McAfee -- http://www.mcafee.com/

    MhonArc -- http://www.mhonarc.org/

    OpenSSH -- http://www.openssh.com/

    PERL -- http://www.perl.org/

    SendMail's -- http://www.sendmail.org/

    Sys Admin magazine -- http://www.sysadminmag.com/documents/s=1160/sam0010a/0010a.htm

    Roberto João Lopes Garcia is a civil engineer, specializing in software engineering. He has worked with Solaris for more than ten years and now also works with Linux. Roberto has developed FORTRAN, C, C++, LISP, and SQL programs for calculations, CAD (Computer Aided Design) drawings, server daemons, dynamic Web pages, etc. He can be contacted at: roberto@mha.com.br.

  •