Cover V03, I04
Article

jul94.tar


Smarter Mail Addresses

Larry Reznick

Not every UNIX system can get a direct connection to Internet for its e-mail. Some have to go through at least one system to get mail in or out. SVR4's mail, rmail, and mailsurr(4) use the mailcnfg(4) file. Among other things, mailcnfg can be used to route mail automatically to another host. Furthermore, users often setup aliases to route mail through the other system and then out to the Internet. Each mail program has some way to support these individual aliases. System administrators can setup system-wide aliases using the SVR4 mailalias(1) program.

A full understanding of the details of mailsurr processing would require cross-referencing the ckbinarsys(1M) program, the binarsys(4) file that ckbinarsys uses, and the smtpqer(1M) program. ckbinarsys and its binarsys file deal with whether another system will accept mail containing binary data. smtpqer queues mail for transmission using SMTP instead of UUCP. You may need to know about these additional configuration options if you plan to use the mailsurr file extensively. This article deals only with routing mail through system-wide aliases and routing mail to another host machine. Ideally, the other host machine knows how to resolve Internet domain addresses that your system has no direct access to.

Mail Surrogate Commands

In /etc/mail/mailsurr are lines containing regular expressions and commands for translating rmail sender and receiver addresses into forms that other programs, usually rmail or smtpqer, can process. Usually, expressions in the sender field are the same throughout the file; the receiver field usually contains the expression of interest to mailsurr. The fields contain regular expressions exactly as used by egrep(1).

mailsurr expressions are handled by a regular expression translator that resolves the special characters. That translator forces the mail program to match the whole sender or receiver string, not just a part. Because of the extra translation, simple parentheses act like the typical regular expression escaped parentheses to identify substring matches. Backslashes must be escaped to prevent them from being translated too soon. Thus, '(.*)' is a substring matching any set of characters. Matching subexpression characters get used in a later part of the same expression via the notation '\\1'. If the backslash were not doubled, the '\1' would be resolved too soon. The translator turns the double backslash into a single backslash so that '\1' gets resolved by the program interpreting the surrogate command later. (See my article, "Using Regular Expressions," Sys Admin, Sept/Oct 1992, for additional discussion of regular expressions.)

In the surrogate commands, certain characters are denied further transmission. These include shell metacharacters such as the back-apostrophe and the semicolon, or the pipe and redirection symbols. Most lines translate into common mail addressing formats.

For example, the interpretation of the user@host address scheme as used on Internet is an inversion of the UUCP host!user format. mailsurr acts as if UUCP were the transport mechanism. Any @ addresses must be translated into ! addresses by inverting the address parts around those characters. Further, where ! addresses are combined with @ addresses, the @ typically has higher precedence. Therefore, localhost!user@remotehost must be translated into remotehost!localhost!user. The mailsurr entry representing this translation is:

'.+'      '([^!].*)@(.+)'     'Translate R=!\\2!\\1'

The first field stands for the sender field and matches any character, one or more. All of the entries in my mailsurr file contain this same sender field, so the only variation in the entries is in the receiver field. This makes sense because the receiver field contains the outgoing address -- the one requiring translation.

The second field is the receiver field. This example matches:

1. a first subexpression beginning with any character other than a leading bang, followed by any other characters, including embedded bangs,

2. the @ symbol,

3. a second subexpression composed of any other characters, one or more.

The third field contains the Translate command. The "R=" precedes the regular expression that changes the receiver string. This example outputs a leading bang, emits the second subexpression, generates another bang, and then emits the first subexpression. Thus, the user@host detector swaps the user with the host around the @ symbol and changes the @ into a bang. Putting the leading bang in the result prevents reevaluation of this receiver string, because the "[^!]" part of the first subexpression excludes any string containing a leading bang.

Consider what happens if the mailsurr file doesn't emit that leading bang. The SVR4 mailsurr man page shows examples without that leading bang inserted in the translated receiver output. Just examining what becomes of the @ translation without that leading bang should reveal the trouble.

First, there's a detail about the asterisk (*) regular expression metacharacter that many people overlook. The asterisk matches the longest possible string it can. In other words, not only would "(.*)@" match the "user" in user@host, but it would match the "user@host" in user@host@otherhost because the asterisk takes every character it can -- even including @ symbols -- until it finds the longest string followed by an @ symbol.

When any receiver translation is done, the mailsurr file is read and applied again from the top until no translations remain. The asterisk expression's action, combined with multiple translations of the same receiver, creates an unusual side-effect: instead of the right-to-left evaluation of a simple Internet @ address, translated into the left-to-right format of a UUCP ! address (user@host becomes host!user), each new reevaluation creates a surprising, almost inside-out address:

Before Evaluation:  user@host@otherhost
First Evaluation:   otherhost!user@host
Last Evaluation:    host!otherhost!user

You should therefore never use multiple @ symbols in a single address: some system's mailsurr file may not be able to handle it. This inside-out translation can prevent your mail from following the correct route.

Fortunately, direct Internet addresses using only one @ are resolved quickly with one query to the net. Indirect addresses with an @ and a bang combined are either routed internally by the receiving system or routed externally through the bang-address included in the at-address. For example, my system was one hop from the system connected to the Internet, so mail coming to me was addressed as rezbook!reznick@csusac.ecs.csus.edu, which only has one @ symbol. The csusac system knew how to find rezbook, and rezbook knew about me.

The user%host format is translated similarly to the user@host format, except that the regular expression must use two percent symbols (%%). Surrogate regular expressions need two percent symbols because surrogate commands use the % for special variables. For example, %S is the "Subject:" header's text, and %U is the local system's name from uname(1). So, just as the backslash must be doubled to get a single backslash character from the evaluation, so must the percent be doubled to get a single percent character.

If an address has both @ symbol notation and % symbol notation, the @ symbol has precedence over the % symbol. This is a side-effect of the entire mailsurr file being reapplied from the top after any modification. One notation must come first. If the @ entry comes before the % entry, as it does in my mailsurr file, but a leading bang is left out of the translations, user@host1%host2 would work like this:

Before evaluation:  user@host1%host2
Evaluate @ symbol:  host1%host2!user
Evaluate % symbol:  host2!user!host1

Putting user in the middle is almost certainly not the intent. If the address is user%host@gateway (an example taken from The Whole Internet User's Guide & Catalog, by Ed Krol, O'Reilly & Assoc, 1992, page 98), the evaluation is:

Before evaluation:  user%host@gateway
Evaluate @ symbol:  gateway!user%host
Evaluate % symbol:  host!gateway!user

which is probably also not correct.

Putting a bang in front of mailsurr's translated output prevents these repeated evaluations. The translated receiver string is given a leading bang, which prevents a second interpretation by the expression. When the mailsurr file is restarted from the top, the receiver string won't be reevaluated by the same rule twice. In fact, it won't be evaluated further by the other rules that exempt leading bangs. That means if only one @ or only one % is in the receiver string, but not both, only one translation takes effect. Thus, the string user@remotehost@localhost would become !localhost!user@remotehost without further evaluation by the local mailer, and user%host@gateway would become !gateway!user%host. The remotehost or gateway would receive the mail, but only with the address part remaining after the second bang. A later entry in mailsurr uses the expression

'!([^!]+)!(.+)'

for the receiver parsing. Notice the leading bang. By this time, all other parsing and translating is finished and it is time to use UUCP or SMTP to send the mail. The UUCP method invokes uux(1) to transmit the following command to the remote system:

\\1!rmail (\\2)

Notice that the receiver expression had the leading bang outside the first subexpression. That excludes it from the1 reference. The rmail argument takes everything past the second bang -- the important one. So !localhost!user@remotehost becomes the uux argument

localhost!rmail (user@remotehost)

Similarly, when using SMTP, the smtpqer program receives

\\1\2

so, again, the gateway system receives an address to user%host. Each remote system will presumably figure out the address notation when it forwards the mail to the next host.

System-wide Mail Aliasing

Before the surrogate commands get to the UUCP or SMTP transmission command, the mailsurr entries translate aliases. A surrogate command uses the mailalias program to do the alias translating. mailalias recognizes two alias categories: individual user aliases and system-wide aliases. Every name not starting with a bang is run through mailalias. mailalias uses the following hierarchy when resolving a receiver's name:

1. If the name is a file in the /var/mail directory, it is the exact login name of some user or pseudo-user. User login names take highest precedence.

2. If the name is found in the sender's $HOME/lib/names file, use the address list following the name. Individual user aliases take precedence over system-wide aliases.

3. Examine the file /etc/mail/namefiles. That file contains a list of full pathnames of other files. Those other files contain system-wide aliases. The two default files listed in /etc/mail/namefiles are /etc/mail/lists, which holds mailing lists where multiple user names are associated with a single alias name, and /etc/mail/names, where a single name is associated with a single alias. If the name is found in any of the files listed, use the address list following the name. System-wide aliases take lowest precedence.

4. If none of the other tests bear fruit, echo the name. It isn't an alias as far as the local system is concerned, nor is it a login name. Presumably the next host will know the name.

To setup system-wide aliases, edit the file /etc/mail/names and add one line for each alias name. The alias name comes first, then at least one space or tab, and finally the real login name. If you want to create mailing lists -- each alias name having many login names associated with it -- put them in /etc/mail/lists instead. When any alias name is found, all names in the list following the alias name are presumed to be mailable receiver names. You can use this feature to create common system aliases, and your users can use this in their own $HOME/lib/names files to create special aliases that only they know.

Because individual users' alias names have precedence over system-wide alias names, users' aliases can hide access to system aliases. Anyone can run the mailalias command to see exactly what the system produces, given a receiver name. If the name produced by one user is not the expected name, the user's alias overrules the system's alias.

Smarter Host

One remaining feature in the mailsurr file -- a very handy feature for systems that don't have a direct connection to the Internet -- is the SMARTERHOST routing. In mailsurr, %X refers to the address of a remote host system that has direct access to the Internet or knows more systems than the local system knows. The only prerequisite is that the local system must know the remote system. This %X mailsurr translation is typically entered last in the mailsurr file. It is commented out in anticipation of systems that have direct Internet connections. Such systems are smarter hosts. To enable this translation, delete the comment (#) character at the beginning of the line.

If the mailsurr translations already applied don't resolve the address, the entry using %X prepends the smarter host's address but without the leading bang. Again, because any change in the receiver string causes the surrogate commands to execute again, further processing is applied to the new address routing through the smarter host.

To assign the smarter host's address, create or edit the file /etc/mail/mailcnfg. mailcfng contains mail and rmail values assigned to variables either used within mail or rmail, or used by mailsurr. Give the key assignment,

SMARTERHOST=systemname

to route mail to a host that is directly connected to the Internet or simply knows more names than your system does. Once the assignment is made, any mail address that the other surrogate commands can't resolve is automatically routed to the smarter host. The smarter host will either resolve the address or send it back to your system as unresolvable. Presumably, the other system is the final arbiter or you wouldn't have selected it as the smarter host. Given the smarter host setting and a UUCP account on a cooperative Internet site, systems without direct Internet access act as if they were on Internet for mail purposes. All of the real work happens behind the scenes.

About the Author

Larry Reznick has been programming professionally since 1978. He is currently working on systems programming in UNIX, MS-DOS, and OS/2. He teaches C language courses at American River College and at National University in Sacramento. He can be reached via email at: rezbook!reznick@csusac.ecs.csus.edu.