Cover V08, I13
Article
Listing 1
Listing 2

feb99.tar


The Sendmail Rulesets-A Detailed Look

Michael Schwager

To help you understand the nitty gritty of Sendmail's rules and rulesets, this article will take you on a tour of a couple of the rules. We'll look closely at two of the major anti-spam rulesets, check_mail and check_rcpt. These rulesets were extracted from antispam.cf; other rulesets mentioned in this document will be found there. (All related code for this article can be downloaded from the Sys Admin Web site.)

Check_mail

Refer to the check_mail listing (Listing 1). We'll start at line 6, which says, "This is a ruleset." Note that the input to this set is the mail address following the "Mail From:" command (i.e., <schwager@enteract.com>). We will use that as the example input for check_mail. Here are selected annotated lines from the rest of the ruleset that illustrate its operation:

Line           Rule
7 R$*          $: $1 $| $>"Local_check_mail" $1
The $* matches anything, so basically our input is copied to $1. The right-hand side (RHS) rewrites the input as:

<schwager@enteract.com> $| (the results of <schwager@enteract.com>
as returned by the Local_check_mail ruleset)
So we enter Local_check_mail. Let's make it simple and assume it looks like:

SLocal_check_mail
R$*                     $@ OK
The input to this ruleset is <schwager@enteract.com>. Since the $* matches it, we return from the ruleset ($@) after rewriting the input with the "OK". Thus, we jump back to line 7, and the result of the rewrite on the RHS is:

<schwager@enteract.com> $| OK
The $: says that we only apply the RHS once (otherwise we'd have an infinite loop). So, we jump down to line 8:

8  R$* $| $#$*                $#$2
The $* will match <schwager@enteract.com>, and the $| will match the $| of our input - of the rule, not of the ruleset. (At this point, we don't care what the input of the ruleset is; each rule gets its input from the results of the rule above.) Note that the $| is actually the "then" operator of an if-then-endif macro conditional, but it makes a handy token to use as a separator, because you will not see it in an email address. Now we take the $#$* of the left-hand side (LHS). This will not match the OK, so this rule does not match. We move to the next rule:

9  R$* $| $*          $@ $>"Basic_check_mail" $1
Here $* will match <schwager@enteract.com>, the $| will match the $| of our input, and $* will match OK. So, on the RHS we copy the first position of the input (<schwager@enteract.com>) to $1. We call Basic_check_mail with <schwager@enteract.com>. Before we jump to line 11, I'd like to call your attention to a little Sendmail trick. Note how lines 7, 8, and 9 use the $| separator. The purpose of line 7 is both to run the input through the Local_check_mail ruleset, and to preserve the input for use by Basic_check_mail in line 9. This is an example of how the simplicity of Sendmail's operators means we sometimes have to be devious to get work done.

On to Basic_check_mail. Lines 13-15 check to see whether Sendmail is deferring delivery; if so, we don't do any more processing in this rule. Note that line 15's purpose, again, is to restore the input to what it was when we entered this rule. Line 17 returns ok because MAIL From: <> means that we're receiving mail from the Mailer-Daemon.

Line 18 is interesting, because it takes the input and runs it through two rulesets, first Ruleset 3 and then Parse0. The results of Ruleset 3 will become the input to Parse0. Parse0 does some syntax and error checking. Ruleset 3 will ensure that our hostname is a fully qualified domain name, and it will focus attention on the hostname, by moving the brackets. Thus, <schwager@enteract.com> will become schwager<@enteract.com.>. Note the addition of the trailing dot on the domain name. The <?> is prepended to the results of the Ruleset 3 call, so our input to this rule comes out as:

<?> schwager<@enteract.com.>
Rule 19 strips the dot; we now have:
<?> schwager<@enteract.com>
after going through it once. We don't need to worry about an infinite loop here, because this does not match the LHS after being rewritten by the RHS the first time.

Line 21 is for handling non-DNS hostnames, although we aren't doing anything special with them (you need to be connected to the Internet to talk to our host). Line 22 matches, though, so our RHS will look like this after replacing the $1's and $2's: $1 becomes schwager; $2 becomes enteract.com; and $3 is empty.

$: <? $(resolve enteract.com $: enteract.com <PERM> $) > schwager <\ 
@ enteract.com >
It will try to resolve enteract.com using the host map (discussed at length in the Sendmail book), which is named "resolve". If successful, it will return <OK> and append that to the domain name. If not, it will return:

enteract.com <PERM>
In our case, it's successful, so the final result is:

<? enteract.com<OK> > schwager < @ enteract.com >
The $: at the beginning of the RHS means rewrite only once. On to line 23. We will match the LHS. The $- is a pattern we have not seen before, it means to match exactly one token, which would be the "OK". So, $1 becomes enteract.com; $2 becomes OK; $3 becomes schwager; $4 becomes enteract.com; and $5 is empty.

The RHS then rewrites it once as:

<OK> schwager < @ enteract.com >
Lines 26-34 will not match. Line 33 is worth looking at. What if a spammer says in the SMTP conversation, "MAIL From: <mr_friendly@localhost>", but they were from a host without a DNS entry? The input of line 27 would be:

<OK> mr_friendly < @ localhost >
but then we rewrite it, while checking the client_name. Since it will not resolve, $&{client_name} looks like [ 192.168.0.55 ] or some such. So line 27 will return:

<? [ 192.168.0.55 ] > <OK> mr_friendly < @ localhost >
If this were from the localhost, line 32 would catch it and rewrite, so the input would look like:

<?> <mr_friendly@localhost>
However, here it will not. So line 33 will match, and the nasty spammer will be denied.

Back to our schwager mail address. Line 37 will match, so our RHS will essentially be: $1 becomes OK; $2 becomes schwager; $3 becomes enteract.com; $4 is empty.

<USER $(access schwager@ $: ? $) > <OK> schwager < @ enteract.com >
The part here to notice is:

$(access schwager@ $: ? $)
This looks up schwager@ in our access database. Note that we're looking up a username, and the @ is appended to it. This is because we're receiving mail from an external host. If line 37 does not match, we will return a simple ?. Assuming it does not, the result of 37 is:

<USER ?> <OK> schwager < @ enteract.com >
Line 39 will match, and we will look up schwager@enteract.com in the access database. Line 39 includes any strings to the right of the domain. Quite honestly, I'm not sure when you would use this. Note that line 39 spans two lines; the carriage return in between is simply whitespace. Note that after the database lookup (the closing "?)" ), the rest of line 40 is the same as 37. Assuming a negative result for us, line 40 returns the same result as 37.

Lines 42/43 are nearly identical to 39/40, except they do not attempt to match past the domain name. Again, our result looks like that of line 37. Lines 45/46 match, but this time we get an RHS that looks like:

$>LookUpDomain <enteract.com> <OK> <>
The LookUpDomain ruleset, found elsewhere in the config file, takes three parameters as input:

<$1> - Key (domain name)
<$2> - Default (what to return if not found in db)
<$3> - Passthru (additional data passed through unchanged)

LookUpDomain will loop and take a domain, for example internal.enteract.com, and check the name against the access database. First, it checks internal.enteract.com. Then, it strips the first word and the dot, and checks enteract.com. Finally, it checks com.

If none of those are found, it will return $2. If any are found, it will return the value found in the access database: OK, TEMP, a string, etc. (see lines 59-65).

We're nearing the end of this ruleset; LookUpDomain will look for enteract.com and not find it, so the result of line 46 is:

<OK> <>
Moving along rapidly now, none of lines 47-58 match. Line 59 matches, and the output of the RHS is:

<OK>
The $@ means to exit this rule after rewriting, so we return <OK>. All the check_ rules work in such a way that as long as we do not return a #error or #discard, the mail will pass our ruleset. Since we return <OK> (it could have been any text at all except the aforementioned), this is successful and the mail will be accepted (at least according to check_mail).

Now, we have taken a detailed tour through the check_mail ruleset. From an anti-spam perspective, things to note are:

Line	Purpose
18	Check for errors in the mail address
22	Sender's domain name must resolve in DNS
33	If you say you're local, you better be local
37-48	Check the access database for:
	user@
	user@domain and stuff
	user@domain
	domain and all subdomains
53-55	Reject if mail is from a non-local host and 
        has no domain name on the From: address
58-65	Exit the ruleset with the results of the 
        searches from 37-48
As a quick exercise, let's look at what happens if we receive a connection from cyberspammer.com, and they give their return address in the mail envelope as "friend" (i.e., during the "MAIL From: " SMTP conversation). Assume friend@ is not found in the access database. I'll gloss over some details and zip through quickly. The trip through the ruleset looks like this:

Line	Action
13	Input: friend. Ignore lines 14-17
18	Returns <?> friend
19-47	No matches anywhere
48	Look up friend@, not found; the RHS returns
	<USER ?> <?> friend
50	Matches and returns <?> friend
53	Matches and returns <? cyberspammer.com> friend
55	Matches and returns
	$#error $@ 5.5.4 $: "553 Domain name required"
Thus, this mail will be rejected. The check_rcpt Ruleset

Recall the SMTP conversation. Between the:

client:  MAIL FROM: some_address@somewhere.com
and the

host:    250 some_address@somewhere.com... Sender ok
This is what we just covered. Check_mail goes into action as soon as the client sends the address. Next, the client sends the recipient address(es). This is where the check_rcpt ruleset comes in. I will now cover this ruleset in detail. Check the ruleset listing in the file check_rcpt (Listing 2). Again, we begin at line 6. Notice how this ruleset looks almost identical to the check_mail ruleset. Since we are in the antispam.com domain, we will look at the recipient address <schwager@antispam.com>. This mail, too, will end in success but will give us the opportunity to exercise as many rules as possible.

We can see that we jump right into Basic_check_rcpt, and that lines 13-15 don't apply.

In line 17, we call the ParseRecipient rule. The purpose is to look at the hostname and strip it off the mail address if it is designated RELAY in the access database or is in the relay-domains file. For example, if a spammer tries to send through us:

schwager@final.domain.com@antispam.com
we will first take off the antispam.com and look at final.domain.com for the remainder of this ruleset. ParseRecipient is smart enough to handle wily tricks, like:

schwager@final.domain.com@antispam.com@antispam.com
It will change that to schwager@final.domain.com, so we can check that domain in this ruleset.

ParseRecipient also calls Parse0 and ruleset 3. So, since we are looking at <schwager@antispam.com>, we will return from ParseRecipient with schwager<@antispam.com>. Line 20 changes our input to <?> schwager<@antispam.com>.

Line 21 matches, since Class w includes our antispam.com domain. Thus, we end up with:

<> <USER schwager> <FULL schwager@antispam.com> <HOST antispam.com>
<schwager<@antispam.com>>
By doing it this way, we are able to look up each field in the access database in lines 24-28: each of the user, the hostname, and the full email address. Note that the username is the local username, so to deny mail to a user in your domain you need to not include the @ sign after the name.

Lines 32 and 33 act on the results of the access database lookup. If the data for the key is "REJECT" or an error string, the mail will be denied. Otherwise, we continue on. Note that we will get to line 36, and our input now looks like:

schwager<@antispam.com>
Line 36 allows for mail destined to our domain; so we will return OK from this ruleset. Had we been looking for an address not in our domain, for example to user@otherdomain.com, we would continue on. Here's what would happen: Line 38 does a more complete lookup of the domain and all subdomains in our email address in the access database.

Lines 44-48 will check whether the mail is for a local user (i.e., simply "username"). If so, okay. If not, the input remains untouched. Note that at this point, we are guaranteed to have an Internet-bound email because we've checked for a local user; we've checked to see if it is destined for a host in our domain; and we've checked to see if it is destined for a host that we relay for.

In lines 50-62, we check to see whether the client that connected to us is in our domain or is one of the hosts we relay for. If so, okay. If not, we look at the IP address of the client in line 66. If the address is not from one of our machines or a machine we relay for, then we fall through to line 78.

From an anti-spam perspective, the key lines in this ruleset are:

Line	Purpose
17	Check for errors in the mail address; make the 
        address look like user<@host.com>
20-33	Check for user, host, or user@host in 
        the access database
35-39	Mail to our domain or hosts we relay for: OK
43-47	Mail to a local user: OK
50-62	The connection was from a local machine or a 
        host we relay for: OK
65-74	The connection was from a local machine or a 
        host we relay for and they are not in the 
        DNS: OK
78	Reject mail that is both not to one of our 
        machines, and not from one of our machines

About the Author

Mike Schwager is a contractor specializing in UNIX and the Internet. He has spent the past 15 years writing C and Perl code, shell scripts, and maintaining systems in the corporate and educational environments. Email him at Michael@Schwager.com or visit http://come.to/lanicservices.