Using Sendmail::Milter to Tinker with Your Mail

Derek Balling
Summer, 2002

Have you ever been asked to append a footer containing a legal disclaimer to every outbound message?" Have you wished you could TAG messages when using spam-detection, or use a scoring system instead of just rejecting them? Have you ever wished you could easily filter attachments; or, essentially, do a myriad of things before sendmail actually accepted a message? Sendmail::Milter is the answer and, in this article, I will explain how to use it.

Background

Sendmail, which originated in the early 1980s, is by far the most predominant SMTP server on the Internet. Sendmail is installed by default on nearly every UNIX-like platform in existence, including even such flavors of UNIX as Apple's Mac OS X. You should take care, however, as there are still a few operating-system vendors whose "default installs" of sendmail include versions that are either horribly out of date or horribly insecure (or both). Make sure you have a current version (most current as of this writing is version 8.12.3).

Milter is a recent addition to sendmail (versions 8.10 and up have it, but it wasn't until 8.11 that it was a "supported" feature). Milter is short for "Mail Filter". The basic premise is that at every stage of the SMTP session -- the connection, the HELO command, the MAIL command, the RCPT command, etc. -- there is a callback where arbitrary C code can perform checks on the connection as it is going, and returns one of the following: "continue", "reject", "accept", and "discard".

Continue means to move on to the next step.
Reject tells sendmail to summarily reject the message (or the specific recipient, depending on the context).
Discard tells sendmail to act as though it is accepting the message, but to actually silently discard it.
Accept tells sendmail that "as far as this milter program is concerned, this message/recipient is fine", and not to ask it any more questions. This distinction is important, because some other sendmail rule, or some other milter program, may have a contrary opinion about the message, and that other program or rule may reject the message. There is no "unconditional acceptance", such that it would override other rules and milters, and it's helpful to keep that in mind.

Milter programs (in native C, or in the Sendmail::Milter world) run as daemons to which the sendmail daemon will talk at runtime. This can either be via UNIX sockets or via a TCP connection. Note also that, for our purposes here, your Perl installation MUST have threads enabled, and programs written for mail filtering should be thread-safe, using locks where appropriate, etc.

Setup

So, how do we build milter into sendmail? Well, if you are using the most current version of sendmail (8.12.3), put the following line into the site.config.m4 file that you use during the ./Build process:

APPENDDEF('confENVDEF', '-DMILTER')

Of course, you may have other stuff already defined in there, such as -DSASL, or -DDNSMAP, etc., in which case you simply add this -D flag to the ones you already have.

Run Build on sendmail, using that site.config.m4 file (using the -f flag). Then cd to 'libmilter', and do a ./Build there, as well as a ./Build install. These procedures will copy the C libraries for libmilter where they belong.

Installation of Sendmail::Milter is rather straightforward. Simply retrieve it from CPAN like any other Perl module. Any version later than 0.18 should work well for our purposes here. Then you need to tell sendmail to use your milter program and where to find it, for example, one listing from my sendmail.mc file is:

INPUT_MAIL_FILTER('timedelay', 'S=local:/var/run/timedelay.sock,
T=C:1m;S:1m;R:1m;E:1m')

The final step is to execute the milter, telling it where to listen for sendmail. The first argument is the milter name, as defined in the sendmail.cf, and the second is the path to said sendmail config.

timedelay.milter.pl timedelay /etc/mail/sendmail.cf

So, enough background, let's dig in with a simple example, and examine how it works:

use Sendmail::Milter;

sub envfrom_callback
{
    my $ctx = shift;
    my @args = @_;
    my $message = "";
    $ctx->setpriv(\$message);
    return SMFIS_CONTINUE;
}

sub body_callback
{
    my $ctx = shift;
    my $body_chunk = shift;
    my $message_ref = $ctx->getpriv();
    ${$message_ref} .= $body_chunk;
    $ctx->setpriv($message_ref);
    return SMFIS_CONTINUE;
}
sub eom_callback
{
    my $ctx = shift;
    my $message_ref = $ctx->getpriv();
    print "   + adding line to message body...\n";
    # Pig-Latin, Babelfish, Double dutch, soo many possibilities!
    # But we're boring...
    ${$message_ref} .= "---> Append me to this message body!\r\n";
    if (not $ctx->replacebody(${$message_ref}))
    {
        print "   - write error!\n";
    }
    $ctx->setpriv(undef);
    return SMFIS_CONTINUE;
}

sub abort_callback
{
    my $ctx = shift;
    $ctx->setpriv(undef);
    return SMFIS_CONTINUE;
}

sub close_callback
{
    my $ctx = shift;
    $ctx->setpriv(undef);
    return SMFIS_CONTINUE;
}

my %my_callbacks =
(
 'envfrom' => \&envfrom_callback,
 'body' =>    \&body_callback,
 'eom' =>     \&eom_callback,
 'abort' =>   \&abort_callback,
 'close' =>   \&close_callback,
 );

BEGIN:
{
    if (scalar(@ARGV) < 2)
    {
        print "Usage: perl $0  \n";
        exit;
    }
    my $conn = Sendmail::Milter::auto_getconn($ARGV[0], $ARGV[1]);
    print "Found connection info for '$ARGV[0]': $conn\n";
    if ($conn =~ /^local:(.+)$/)
    {
        my $unix_socket = $1;
        if (-e $unix_socket)
        {
            print "Attempting to unlink UNIX socket '$conn' ... ";
            if (unlink($unix_socket) == 0)
            {
                print "failed.\n";
                exit;
            }
            print "successful.\n";
        }
    }
    if (not Sendmail::Milter::auto_setconn($ARGV[0], $ARGV[1]))
    {
        print "Failed to detect connection information.\n";
        exit;
    }
    if (not Sendmail::Milter::register($ARGV[0], \%my_callbacks,
                                    SMFI_CURR_ACTS))
    {
        print "Failed to register callbacks for $ARGV[0].\n";
        exit;
    }
    print "Starting Sendmail::Milter $Sendmail::Milter::VERSION
           engine.\n";
    if (Sendmail::Milter::main())
    {
        print "Successful exit from the Sendmail::Milter engine.\n";
    }
    else
    {
        print "Unsuccessful exit from the Sendmail::Milter engine.\n";
    }
}

Everything after the "BEGIN" is common to every Sendmail::Milter program (and we won't show it in future program listings). It's essentially the "glue" that gets the callbacks listed in %my_callbacks associated with each relevant stage of the SMTP session. So, by looking at %my_callbacks, we see that we care about the "MAIL FROM" callback, the body callback, the eom (end-of-message) callback, and the abort and close callbacks.

So the first thing we get to, in the session, is the "MAIL FROM" (the envelope-from). Here we get $ctx, which some might call $self, so that we've got our own object identifier. We create $message as an empty string, and then we call "setpriv". Each connection has the ability to save data as necessary. setpriv takes as its argument a reference to the data structure you wish to save for future use. In more complicated examples, this might be a hash-reference or something similar. In our example though, it is simply a reference to our empty scalar $message. We then tell sendmail "ok to continue" by returning SMFIS_CONTINUE.

The next thing we care about is the body segment. body_callback may be called multiple times, as "chunks" are received from the client system. As each body segment is received, we add that chunk to a copy we're keeping in the private data space.

At the end-of-message stage is where we can make changes to the message in any way, and so that's what we're going to do. The copy of the body we've been stashing away all along is now recalled via getpriv(). We add some mumbo-jumbo to the end of that string. Then, we work the milter magic, and call the replacebody method, replacing whatever body there was with what we created, which might include our nonsense message, or a serious disclaimer for a law firm or the like.

But maybe you don't want to add that message to every message, but only the ones that come from your firm and are headed outward. That makes sense, because messages coming in don't need your copyright notice, or your confidentiality notice, etc. To do that, you might replace envfrom_callback with:

sub envfrom_callback
{
    my $ctx = shift;
    my $sender = shift;
    my $my_domain = 'example.com';
    $sender =~ s/\>$//;
    $sender =~ s/^\<//;
    if ($sender !~ /\@$my_domain$/)
    {
        return SMFIS_ACCEPT;
    }
    else
    {
        my $message = "";
        $ctx->setpriv(\$message);
        return SMFIS_CONTINUE;
    }
}

So what are we doing differently? Well, we're saying "if this message isn't from our domain, then just go ahead and accept it, don't bother doing any more callbacks to this milter program". If we never get another callback for this message, then we'll never build up a copy of the message, and we'll never actually alter the message. However, if we do match the local domain, then we create the private dataspace, return CONTINUE instead, and continue as we did before.

Maybe you only want to have just a body_callback that looks like:

sub body_callback
{
    my $ctx = shift;
    my $body_chunk = shift;
    if ($body_chunk =~ /I send you this file in order to have/)
    {
        $ctx->setreply("550","5.7.1","You appear to be infected with
                      SIRCAM");
        return SMFIS_REJECT;
    }
    else
    {
        return SMFIS_ACCEPT;
    }
}

So this time we look at the body, and if the body contains a string we don't like, we reject the message. We also use the setreply method to set the response code (550), the extended response code (5.7.1), and the specific message. If "SMFIS_REJECT" is called, without using setreply, a very generic "Access denied" message will be sent, which isn't very useful to the sending party.

Maybe you don't want to reject the message, but don't want to let it through, either. In that case, you could replace SMFIS_REJECT with SMFIS_DISCARD, in the above callback. By doing that, sendmail would pretend to accept the message from the sending system, but would then drop the message on the floor and forget it ever saw it.

Let's look at a little more complicated example, that of a way to create "timedelay" addresses, or addresses that expire after a certain amount of time, great for throwaway on web forms. An example address might be something like djb2001091645@td.megacity.org, an address that would go to whoever 'djb' is, and is good for 45 days from 09/16/2001.

#!/usr/local/bin/perl
# $Id: timedelay.milter.pl,v 1.4 2001/08/31 04:30:12 dredd Exp $
# Usage:
#  1  create an entry "td.example.com IN MX 0 mail.example.com"
#  2  create, in virtusertable  "@td.example.com dev-null"
#  3  create, in aliases, a user "dev-null" pointing to the /dev/null
#     device
#  4  change the %prefixes hash below to contain the prefix->username
#     mappings you want. The users MUST be local (at least, I think
#     they must be local)
#  5  Create your milter entries in sendmail.mc, recompile your .cf
#     and go for it.

use Sendmail::Milter;
use Time::Local;
$0 = 'Sendmail Milter - Time Delay';
my $regex_lock;
my %prefixes = ('djb' => 'dredd@megacity.org',
               'foo' => 'foo@example.com',
               'bar' => 'bar@example.com');
sub envrcpt_callback
{
    my $ctx = shift;
    my @args = @_;
    my $recip = $args[0];
    my $lhs;
    {
        lock $regex_lock;
        ($lhs) = $recip =~ /\<?(.*)\@td\.example\.com/;
    }
    return SMFIS_ACCEPT if ! defined $lhs;
    my %hash = ('rewrite' => 0);
    if ($lhs =~ /[A-Za-z]{2,3}\d{10}/)
    {
        my ($final_recipient,$date,$life);
        my ($myear,$mmon,$mday);
        {
            lock $regex_lock;
            ($final_recipient,$date,$life) = $lhs =~
                       /([A-Za-z]{2,3})(\d{8})(\d{2})/;
            ($myear,$mmon,$mday) = $date =~ /(\d{4})(\d{2})(\d{2})/;
        }
        $mmon--; $myear -= 1900;
        my $start_date = timelocal(0,0,0,$mday,$mmon,$myear);
        my $finish_date = $start_date + ($life * 86400);
        if ( ($finish_date > time) and
             ($start_date < time) )
        {
            # change recipient to $prefixes{$final_recipient}
            if (! defined $prefixes{$final_recipient})
            {
                $ctx->setreply("551","5.7.1","What the heck?");
                return SMFIS_REJECT;
            }
            else
            {
                my $end_rcpt = "<" . $prefixes{$final_recipient} . ">";
                my $href = $ctx->getpriv();
                if ($$href{'rewrite'} == 0)
                {
                    %hash = ('rewrite' => 1,
                            'old' => [$recip],
                            'new' => [$end_rcpt]);
                }
                else
                {
                    my @old_old = @{$$href{'old'}};
                    my @old_new = @{$$href{'new'}};
                    %hash = ('rewrite' => 1,
                            'old' => [@old_old, $recip],
                            'new' => [@old_new, $end_rcpt]);

                }
                $ctx->setpriv(\%hash);
                return SMFIS_CONTINUE;
            }
        }
        else
        {
            $ctx->setreply("551","5.7.1","Expiration Reached.");
            return SMFIS_REJECT;
        }
    }
    else
    {
        $ctx->setreply("551","5.7.1","Address not understood.");
        return SMFIS_REJECT;
    }
}
sub abort_callback
{
        my $ctx = shift;
        $ctx->setpriv(undef);
        return SMFIS_CONTINUE;
}
sub close_callback
{
        my $ctx = shift;
        $ctx->setpriv(undef);
        return SMFIS_CONTINUE;
}
sub eom_callback
{
    my $ctx = shift;
    my $chunk;
    my $href = $ctx->getpriv();
    if ($$href{'rewrite'})
    {
        foreach my $del_old (@{$$href{'old'}})
        {
            if (not $ctx->delrcpt($del_old))
            {
                print STDERR "Unable to delrcpt $del_old\n";
            }
        }
        foreach my $add_new (@{$$href{'new'}})
        {
            if (not $ctx->addrcpt($add_new))
            {
                print STDERR "Unable to addrcpt $add_new\n";
            }
        }
    }
    return SMFIS_CONTINUE;
}

my %my_callbacks =
(
        'envrcpt' =>  \&envrcpt_callback,
        'eom' =>      \&eom_callback,
        'abort' =>    \&abort_callback,
        'close' =>    \&close_callback,
);

Wow. That's a little bigger than before. So let's look at what we're doing. In envrcpt_callback is where the tricks are played. First, we see whether the recipient is something we care about, namely something that is in the td.example.com domain, which is the domain we've assigned our timedelay addresses to. If it isn't, then it is outside the scope of this milter, so we accept it out of hand and wait for the next message.

So, then we break down the left-hand side into its component parts -- a 2- or 3-letter identifier that determines the final recipient, the year, month, date of "creation" and a 2-digit number signifying the "lifetime" in days of the address. We perform some calculations to see whether "we're still within the lifetime of the address" as defined by those characteristics.

If the address we've received is expired, then we reject it. If the address doesn't make sense (e.g., it doesn't match the name format), we reject it; or if the identification prefix is one we don't use, we reject it. Otherwise, depending on whether this is the "first rewritten address" or the second one we've gotten in the same message, we build up a hash that contains all the "old" addresses, and all the "new" addresses, that we're going to come back to later. Remember that only in the end-of-message phase can we actually make changes, so we have to make notes in the private dataspace and refer to them later to actually get this to the right recipient.

In the eom_callback, we then loop over all the addresses we need to remove and all the addresses we need to add, calling the delrcpt, and addrcpt methods on each of them, respectively. This deletes and adds the recipients, completely transparent to the sender.

One thing to note is that, by default, milter programs are unaffected by things like host-based whitelisting, the "spam-friend" feature, etc. Sendmail::AccessDB (which is available from CPAN and maintained by me) is a nice interface for making sure that sender- and host-based whitelisting is followed, as well as recipients that have chosen to always receive mail using the spam-friend feature, etc.

With Sendmail::Milter, there are no limits on what you can do to the mail experience. There are folks who have already written applications which "de-mime" the body segment, stripping out attachments, or you could run a virus scanner on them in real-time, you could look for "spam symptoms", and reject mail that might easily slip by sendmail's filtering (or that would be extremely hard to filter using sendmail's configuration syntax). One example of the multitude of things that can be done is at:

http://www.megacity.org/software_downloads/spamcheck.milter.txt

It is literally too long to print here (some 2100 lines), and changes constantly as new features are added. I hope the examples shown here can give you the basics you need to start experimenting with Sendmail::Milter programs on your own. You can also visit http://www.milter.org/ for more examples.

Derek Balling (tpj@megacity.org) is the Sr. System Administrator at Byram Healthcare in White Plains, NY, and lives in Trumbull, CT. When not beating up on organizations for disobeying various RFCs, he tries to avoid being squashed by his 25-pound cat, William F. Bigglesworth (a.k.a. Mr. Bill).