Cover V08, I03
Article

mar99.tar


Proactive Spam Prevention

Michael Schwager

Last month, I discussed the basics of blocking mail from spammers. With the check_mail and check_rcpt rulesets, we can block mail that breaks some of the rules (i.e., mail from a domain not given in the DNS) or that attempts to relay. The check_relay ruleset uses the Realtime Blackhole List to block known spammers. If spammers try to look legitimate and create a proper RFC-822 envelope and header, we can still block them based on their return address by using the access database. This month, I conclude my three-part series on blocking spam email with a discussion of more proactive approaches. I'll begin with a more in-depth discussion of the Realtime Blackhole List (RBL).

How the RBL Works

Offending machines or domains are added to the RBL by people who have been spammed. This step is not to be taken lightly, since mail from an entire domain may end up being blocked by thousands of hosts. Basically, when a spam recipient (which could be you) receives spam, they need to contact the ISP or host of the spammer to get it stopped. If a host refuses or is unable to contain the spammer, they can be added to the list via an email message to the maintainers of the RBL. See http://maps.vix.com/rbl/reporting.html for more information.

Some people might say that this is vigilantism or big-brotherism, however, subscription to the RBL is entirely voluntary. If you want to use it, you may. If you do not, don't. It's as simple as that. Nothing can be more empowering to an individual or a corporation than to be able to say, "I choose not to receive mail from known spammers. I trust the RBL to list those spammers for me." There is no coercion, no violation of rights.

Many ISPs, once they find that they are on the RBL, waste little time in working to get themselves off it. However, you may find a site that is reluctant to do so; I have known ISPs to remain on the RBL for weeks. If people within your domain cannot wait that long, there are ways to fix the situation. Simply add the domain to the access database, like this:

clumsy.isp.com        OK
Mail from that domain will now be allowed into your domain.

How do you know if the RBL, or any of the rules, are working? I like lots of logging turned on. The default is 9, and should be adequate for most purposes. Check your /etc/syslog.conf file and direct your mail.info logging to a file of your choice, like this:

mail.info             /expand/log/maillog

Don't forget to touch that maillog file, and kill -HUP syslogd so that it sees your changes. Now, when spam comes in, you will see messages like this:

Sep 29 01:04:30 antispam.com sendmail[24509]: Ruleset check_mail
(<scvzzxxv@yahoo.com>) rejection: 553 <scvzzxxv@yahoo.com>...  Mail from
195.76.98.3 refused, see http://maps.vix.com/rbl/

Hmm, is that really a spammer? Let's look:

telnet 195.76.98.3 25

We see:

Connected to 195.76.98.3.
Escape character is '^]'.
220 futurnet.es Sendmail SMI-8.6/SMI-SVR4 ready at Wed, 30 Sep 1998 02:47:52
-0100

Sure enough, the From: address was scvzzxxv@yahoo.com but the machine is really at futurnet.es - surely a spammer. The host is running Sun's Sendmail, which does not have the anti-spam functionality of Sendmail 8. Likely they were being used as a relay, and were perhaps either slow to upgrade or ignored pleas to do so. Update: As of 12/7/98, their welcome banner looks like this:

220 futurnet.es ESMTP Sendmail 8.8.8+Sun ready at Mon, Dec 7 1998 21:42:36
-0100

They seem to have been inspired to upgrade :-).

Regular Expressions

We can match mail addresses against regular expressions. For example, often spammers use a From: address with all numbers in the username. Although risky (some paging-by-mail systems use a format like that), you may want to apply this rule. You can add the rule to the check_mail ruleset, by changing the Local_check_mail ruleset in the antispam.mc file. It is set up like a database lookup. After the:

LOCAL_CONFIG
# Change "hostname" to a real hostname in the following.
# localhost is used verbatim.
Cwantispam hostname.antispam.com antispam.com localhost antispam.UUCP
antispam.com.UUCP

Add the following line:

Kallnumbers regex -a@MATCH ^[0-9]+$

Then change:

SLocal_check_mail
# Local_check_mail input:  the address following the colon in the MAIL
command
#
     
R$*                     $@ OK

To look like this:

SLocal_check_mail
# Local_check_mail input:  the address following the colon in the MAIL
command
# check address against various regex checks
R$*                             $: $>Parse0 $>3 $1
R$+ < @ $+ > $*                 $: $(allnumbers $1 $: $1<@$2> $)
R@MATCH                         $#error $: 553 Header Error

Mail Headers

We have seen what can be done in the mail envelope, now what about the headers (the addresses people see in their mail programs)? Spammers often will use legitimate-looking addresses in the envelope, but the headers will have such things as "Friend@public.com". It may be easier for you as a Systems Administrator to block mail based on header addresses. Beginning with revision 9, Sendmail 8 has the capability to reject mail based on the contents of the mail headers. See antispam.mc.PROACTIVE in the schwager code package on www.samag.com from last month. The headers are tokenized and processed just as an envelope address would be.

We take a clue from the Basic_check_mail and Basic_check_rcpt rulesets and block mail based on the contents of the From: and To: headers. We also include the regular expression lookups for addresses with all numbers, as described above in Local_check_mail. I create a rule called LookupHeaderUser and have both the CheckFrom and CheckTo rules call it. All the logic for looking up the header mail addresses is located there. Make your CheckFrom ruleset look like:

SCheckFrom
R$*                             $>LookupHeaderUser $1
,p> The CheckTo rule will look like this:

SCheckTo
R$+ , $+                $@ $>LookupHeaderUser $1
R$*                     $@ $>LookupHeaderUser $1

I set this up so that it only looks at the first address in a To: line with multiple addresses. Working on multiple addresses is left as an exercise for the reader. Be careful though, because on a busy gateway, long mail headers may make your Sendmail daemon very busy.

Notice that LookupHeaderUser defines the user part of the mail address as "user@" to the access database. I like that because it's unambiguous in the access database. You can easily change the rule; in the mc file when you see "access $2@" or "access $1@" just remove the "@".

You can telnet in as we did earlier to check whether this works, although this time you will need to actually enter a mail message after the RCPT To: part of the SMTP conversation. For example, using a Sendmail config created with the antispam.mc.03 file, you could type:

data
From:  address@domain
To: 123456789@antispam.com<
  
testing
.
Sendmail will reply with:

553 header error

The one disadvantage of this is that Sendmail will receive the entire message before rejecting it.

What if all these measures fail, and you want to be even more proactive? Checking the subject line and the Message-ID headers will give you more control. Since spammers often create messages with invalid Message-IDs, you can include the following rules in the CheckMessageID ruleset:

SCheckMessageId
R< $+ @ $+ >            $@ OK
R$*                     $#error $: 553 Header Error

To check the subject line, you might do something like this:

SCheckSubject
R$* make money fast $*  $#error $@ 5.7.1 $: "553 Header Error"
R$* hot girls $*        $#error $@ 5.7.1 $: "553 Header Error"

The left-hand side (LHS) in the CheckSubject rule is essentially a case-insensitive substring search, not a regular expression. Go ahead and change the rules in the antispam.mc file; when you run build, your changes will be included. As before, you can check the effect of these changes by telneting to port 25 on your mail server and attempting to send mail in violation of these rules.

Unfortunately, every time we want to update our subject lines we would be editing the sendmail.cf file. You may want to keep such "data" out of the "code" of your config file. Here's how to do that.

Define a file class, such as:

FS/usr/local/sendmail-r8.9/databases/reject-subjects, and change the
CheckSubject rule to look like this:
     
R$* $=S $*        $#error $@ 5.7.1 $: "553 Header Error"

Now, put your strings in the reject-subjects file. The FS line would be placed in the antispam.mc file in the LOCAL_CONFIG section.

Where to Go from Here

Congratulations, with implementation of the measures I've described, you are well an your way to preventing at least some of the spam mail from reaching your users. At least, many of the more obvious violators will be stopped. In the future, spammers will get more sophisticated and you may find much of what's been discussed to be nearly useless. Is there more you can do?

In terms of filtering mail, the procmail utility is also quite useful. It allows for all sorts of scanning capabilities based on the contents of both the headers and the body of the mail message. This can be set up on an individual basis (as I've done at my ISP), or procmail can be called from Sendmail. Set it as the default mailer for local mail on your hosts, and protect all of your users on a POP3/IMAP4 server that's running Sendmail. Instructions for including procmail support in Sendmail are in the cf/README file in the Sendmail hierarchy.

If you want to filter before getting to the mail delivery part - in other words, you want to reject the mail but still be able to check body content, you can use the checkcompat() routine and write some C code to rummage through the message, or have it exec a Perl script to do so. One of the arguments to checkcompat() is the message envelope; in there, you will find the name of the file that contains the body of the message.

The check_compat ruleset is given both the recipient and sender mail address as input. You may find a use for it, beyond the check_rcpt and check_mail rulesets.

Another proactive measure that cannot be overlooked is foregoing all of this trouble and purchasing an anti-spam service. There are currently two that I know of, by BSDI, Inc. and Bright Light Technologies. They both work by extending the capabilities of Sendmail through hooks into Sendmail and an associated database. A team of dedicated "spambusters" scour the Internet looking for spam. When found, your database is updated on the fly so that as little spam as possible enters your domain. See below for contact information. As of this writing the BSDI solution has been withdrawn from the market, but expect to see it again in the first quarter of 1999.

Conclusion

Armed with Sendmail 8.9, and the check_* rulesets, you now have an inexpensive tool for cutting down on spam coming into your domain(s). I encourage you to join the fight! May the good guys win (for you spammers who may be reading, that would be us hard-working sys admins)!

Further Resources

Reading

Sendmail, 2nd Edition, by Eric Allman and Bryan Costales. O'Reilly and Associates. ISBN 1-56592-222-0. It's the Bible for Sendmail, written by the man himself, Eric Allman. If you are configuring Sendmail and you don't have this book, you are a far braver human than I. This is also known as "The Bat Book".

Usenet newsgroups: comp.mail.sendmail and
news.admin.net-abuse.email

Web Sites

www.spam.com/ci.htm - The definitive SPAM® site. (SPAM® is a registered trademark of Hormel Foods.)

www.sendmail.org - Resources for the freeware version of Sendmail.

www.sendmail.com - Eric Allman's company, if you want to call in the professionals.

members.aol.com/emailfaq/emailfaq.html - The email abuse FAQ.

maps.vix.com/rbl/ - Paul Vixie's MAPS RBL site.

www.brightlight.com - See the "Spam Calculator" to give some indication of the cost of spam.

www.BSDI.COM/products/BMF/ - The MailFilter product.

www.bsdi.com/white-papers/war-on-spam - Rob Kolstad's white paper on the war on spam.

www.informatik.uni-kiel.de/~ca/email/english.html - A very good site for technical information concerning Sendmail version 8.8 and above. Claus Assmann wrote a very complete set of mail header check rulesets; see also:
www.informatik.uni-kiel.de/~ca/email/chk-89.html#HEADER

www.ora.com/catalog/sendmail2/ - A description of The Bat Book, the Sendmail book, the Book I live (and die) by. It's big; it's thorough; it's essential.

www.cauce.org - The Coalition Against Unsolicited Commercial Email. Write your congressperson, and ask them to put an end to all this. Join the CAUCE and join The Good Fight. A good resource for keeping tabs on where we are on the political front.

http://www.porcupine.org/postfix/ - Weitse Venema's Postfix project. This looks like a very well-written program. At this point, I think Sendmail is very useful but also very system-hungry; perhaps this will prove a viable alternative.

Commercial Products

Bright Mail, by Bright Light Technologies (www.brightlight.com). Uses an API and software in concert with the Sendmail discussed here. Allows you to use your own hardware. Database is updated on the fly, 7x24.

MailFilter, by BSDI (www.bsdi.com). Composed of both a PC running the BSDI OS and mail filtering software, a "plug-and-play" solution. Database is updated on the fly, 7x24.

About the Author

Mike Schwager is a contractor specializing in UNIX and the Internet. He has spent the past 15 years writing C and Perl code, shell scripts, and maintaining systems in the corporate and educational environments. Email him at Michael@Schwager.com or visit http://come.to/lanicservices.