Smarter Mail Addresses
Larry Reznick
Not every UNIX system can get a direct connection to
Internet for
its e-mail. Some have to go through at least one system
to get mail
in or out. SVR4's mail, rmail, and mailsurr(4)
use the mailcnfg(4) file. Among other things, mailcnfg
can be used to route mail automatically to another host.
Furthermore,
users often setup aliases to route mail through the
other system and
then out to the Internet. Each mail program has some
way to support
these individual aliases. System administrators can
setup system-wide
aliases using the SVR4 mailalias(1) program.
A full understanding of the details of mailsurr processing
would require cross-referencing the ckbinarsys(1M) program,
the binarsys(4) file that ckbinarsys uses, and the
smtpqer(1M) program. ckbinarsys and its binarsys
file deal with whether another system will accept mail
containing
binary data. smtpqer queues mail for transmission using
SMTP
instead of UUCP. You may need to know about these additional
configuration
options if you plan to use the mailsurr file extensively.
This article deals only with routing mail through system-wide
aliases
and routing mail to another host machine. Ideally, the
other host
machine knows how to resolve Internet domain addresses
that your system
has no direct access to.
Mail Surrogate Commands
In /etc/mail/mailsurr are lines containing regular expressions
and commands for translating rmail sender and receiver
addresses
into forms that other programs, usually rmail or smtpqer,
can process. Usually, expressions in the sender field
are the same
throughout the file; the receiver field usually contains
the expression
of interest to mailsurr. The fields contain regular
expressions
exactly as used by egrep(1).
mailsurr expressions are handled by a regular expression
translator
that resolves the special characters. That translator
forces the mail
program to match the whole sender or receiver string,
not just a part.
Because of the extra translation, simple parentheses
act like the
typical regular expression escaped parentheses to identify
substring
matches. Backslashes must be escaped to prevent them
from being translated
too soon. Thus, '(.*)' is a substring matching any set
of characters.
Matching subexpression characters get used in a later
part of the
same expression via the notation '\\1'. If the backslash
were not
doubled, the '\1' would be resolved too soon. The translator
turns
the double backslash into a single backslash so that
'\1' gets resolved
by the program interpreting the surrogate command later.
(See my article,
"Using Regular Expressions," Sys Admin, Sept/Oct
1992,
for additional discussion of regular expressions.)
In the surrogate commands, certain characters are denied
further transmission.
These include shell metacharacters such as the back-apostrophe
and
the semicolon, or the pipe and redirection symbols.
Most lines translate
into common mail addressing formats.
For example, the interpretation of the user@host address
scheme
as used on Internet is an inversion of the UUCP host!user
format. mailsurr acts as if UUCP were the transport
mechanism.
Any @ addresses must be translated into ! addresses
by inverting the address parts around those characters.
Further, where
! addresses are combined with @ addresses, the @
typically has higher precedence. Therefore, localhost!user@remotehost
must be translated into remotehost!localhost!user. The
mailsurr
entry representing this translation is:
'.+' '([^!].*)@(.+)' 'Translate R=!\\2!\\1'
The first field stands for the sender field and matches
any character,
one or more. All of the entries in my mailsurr file
contain
this same sender field, so the only variation in the
entries is in
the receiver field. This makes sense because the receiver
field contains
the outgoing address -- the one requiring translation.
The second field is the receiver field. This example
matches:
1. a first subexpression beginning with any character
other than a leading bang, followed by any other characters,
including
embedded bangs,
2. the @ symbol,
3. a second subexpression composed of any other characters,
one or more.
The third field contains the Translate command. The
"R="
precedes the regular expression that changes the receiver
string.
This example outputs a leading bang, emits the second
subexpression,
generates another bang, and then emits the first subexpression.
Thus,
the user@host detector swaps the user with the host
around
the @ symbol and changes the @ into a bang. Putting
the leading bang in the result prevents reevaluation
of this receiver
string, because the "[^!]" part of the first
subexpression
excludes any string containing a leading bang.
Consider what happens if the mailsurr file doesn't emit
that
leading bang. The SVR4 mailsurr man page shows examples
without
that leading bang inserted in the translated receiver
output. Just
examining what becomes of the @ translation without
that leading
bang should reveal the trouble.
First, there's a detail about the asterisk (*) regular
expression
metacharacter that many people overlook. The asterisk
matches the
longest possible string it can. In other words, not
only would "(.*)@"
match the "user" in user@host, but it would
match
the "user@host" in user@host@otherhost because
the
asterisk takes every character it can -- even including
@
symbols -- until it finds the longest string followed
by an @
symbol.
When any receiver translation is done, the mailsurr
file is
read and applied again from the top until no translations
remain.
The asterisk expression's action, combined with multiple
translations
of the same receiver, creates an unusual side-effect:
instead of the
right-to-left evaluation of a simple Internet @ address,
translated
into the left-to-right format of a UUCP ! address (user@host
becomes host!user), each new reevaluation creates a
surprising,
almost inside-out address:
Before Evaluation: user@host@otherhost
First Evaluation: otherhost!user@host
Last Evaluation: host!otherhost!user
You should therefore never use multiple @ symbols in
a single
address: some system's mailsurr file may not be able
to handle
it. This inside-out translation can prevent your mail
from following
the correct route.
Fortunately, direct Internet addresses using only one
@ are
resolved quickly with one query to the net. Indirect
addresses with
an @ and a bang combined are either routed internally
by the
receiving system or routed externally through the bang-address
included
in the at-address. For example, my system was one hop
from the system
connected to the Internet, so mail coming to me was
addressed as rezbook!reznick@csusac.ecs.csus.edu,
which only has one @ symbol. The csusac system knew
how to find rezbook, and rezbook knew about me.
The user%host format is translated similarly to the
user@host
format, except that the regular expression must use
two percent symbols
(%%). Surrogate regular expressions need two percent
symbols
because surrogate commands use the % for special variables.
For example, %S is the "Subject:" header's
text, and
%U is the local system's name from uname(1). So, just
as the backslash must be doubled to get a single backslash
character
from the evaluation, so must the percent be doubled
to get a single
percent character.
If an address has both @ symbol notation and % symbol
notation, the @ symbol has precedence over the % symbol.
This is a side-effect of the entire mailsurr file being
reapplied
from the top after any modification. One notation must
come first.
If the @ entry comes before the % entry, as it does
in my mailsurr file, but a leading bang is left out
of the
translations, user@host1%host2 would work like this:
Before evaluation: user@host1%host2
Evaluate @ symbol: host1%host2!user
Evaluate % symbol: host2!user!host1
Putting user in the middle is almost certainly not the
intent.
If the address is user%host@gateway (an example taken
from
The Whole Internet User's Guide & Catalog, by Ed
Krol, O'Reilly
& Assoc, 1992, page 98), the evaluation is:
Before evaluation: user%host@gateway
Evaluate @ symbol: gateway!user%host
Evaluate % symbol: host!gateway!user
which is probably also not correct.
Putting a bang in front of mailsurr's translated output
prevents
these repeated evaluations. The translated receiver
string is given
a leading bang, which prevents a second interpretation
by the expression.
When the mailsurr file is restarted from the top, the
receiver
string won't be reevaluated by the same rule twice.
In fact, it won't
be evaluated further by the other rules that exempt
leading bangs.
That means if only one @ or only one % is in the
receiver string, but not both, only one translation
takes effect.
Thus, the string user@remotehost@localhost would become
!localhost!user@remotehost without further evaluation
by the local mailer, and user%host@gateway would become
!gateway!user%host.
The remotehost or gateway would receive the mail, but
only with the
address part remaining after the second bang. A later
entry in mailsurr
uses the expression
'!([^!]+)!(.+)'
for the receiver parsing. Notice the leading bang. By
this time, all other parsing and translating is finished
and it is
time to use UUCP or SMTP to send the mail. The UUCP
method invokes
uux(1) to transmit the following command to the remote
system:
\\1!rmail (\\2)
Notice that the receiver expression had the leading
bang outside the
first subexpression. That excludes it from the1 reference.
The rmail argument takes everything past the second
bang --
the important one. So !localhost!user@remotehost becomes
the
uux argument
localhost!rmail (user@remotehost)
Similarly, when using SMTP, the smtpqer program receives
\\1\2
so, again, the gateway system receives an address to
user%host. Each remote system will presumably figure
out the
address notation when it forwards the mail to the next
host.
System-wide Mail Aliasing
Before the surrogate commands get to the UUCP or SMTP
transmission
command, the mailsurr entries translate aliases. A surrogate
command uses the mailalias program to do the alias translating.
mailalias recognizes two alias categories: individual
user
aliases and system-wide aliases. Every name not starting
with a bang
is run through mailalias. mailalias uses the following
hierarchy when resolving a receiver's name:
1. If the name is a file in the /var/mail directory,
it is the exact login name of some user or pseudo-user.
User login
names take highest precedence.
2. If the name is found in the sender's $HOME/lib/names
file, use the address list following the name. Individual
user aliases
take precedence over system-wide aliases.
3. Examine the file /etc/mail/namefiles.
That file contains a list of full pathnames of other
files. Those
other files contain system-wide aliases. The two default
files listed
in /etc/mail/namefiles are /etc/mail/lists, which
holds mailing lists where multiple user names are associated
with
a single alias name, and /etc/mail/names, where a single
name
is associated with a single alias. If the name is found
in any
of the files listed, use the address list following
the name. System-wide
aliases take lowest precedence.
4. If none of the other tests bear fruit, echo the name.
It isn't an alias as far as the local system is concerned,
nor is
it a login name. Presumably the next host will know
the name.
To setup system-wide aliases, edit the file /etc/mail/names
and add one line for each alias name. The alias name
comes first,
then at least one space or tab, and finally the real
login name. If
you want to create mailing lists -- each alias name
having many
login names associated with it -- put them in /etc/mail/lists
instead. When any alias name is found, all names in
the list following
the alias name are presumed to be mailable receiver
names. You can
use this feature to create common system aliases, and
your users can
use this in their own $HOME/lib/names files to create
special
aliases that only they know.
Because individual users' alias names have precedence
over system-wide
alias names, users' aliases can hide access to system
aliases. Anyone
can run the mailalias command to see exactly what the
system
produces, given a receiver name. If the name produced
by one user
is not the expected name, the user's alias overrules
the system's
alias.
Smarter Host
One remaining feature in the mailsurr file -- a very
handy
feature for systems that don't have a direct connection
to the Internet
-- is the SMARTERHOST routing. In mailsurr, %X
refers to the address of a remote host system that has
direct access
to the Internet or knows more systems than the local
system knows.
The only prerequisite is that the local system must
know the remote
system. This %X mailsurr translation is typically entered
last in the mailsurr file. It is commented out in anticipation
of systems that have direct Internet connections. Such
systems are
smarter hosts. To enable this translation, delete the
comment (#)
character at the beginning of the line.
If the mailsurr translations already applied don't resolve
the address, the entry using %X prepends the smarter
host's
address but without the leading bang. Again, because
any change in
the receiver string causes the surrogate commands to
execute again,
further processing is applied to the new address routing
through the
smarter host.
To assign the smarter host's address, create or edit
the file /etc/mail/mailcnfg.
mailcfng contains mail and rmail values assigned
to variables either used within mail or rmail, or
used by mailsurr. Give the key assignment,
SMARTERHOST=systemname
to route mail to a host that is directly connected to
the Internet or simply knows more names than your system
does. Once
the assignment is made, any mail address that the other
surrogate
commands can't resolve is automatically routed to the
smarter host.
The smarter host will either resolve the address or
send it back to
your system as unresolvable. Presumably, the other system
is the final
arbiter or you wouldn't have selected it as the smarter
host. Given
the smarter host setting and a UUCP account on a cooperative
Internet
site, systems without direct Internet access act as
if they were on
Internet for mail purposes. All of the real work happens
behind the
scenes.
About the Author
Larry Reznick has been programming professionally since
1978. He is currently
working on systems programming in UNIX, MS-DOS, and
OS/2.
He teaches C language courses
at American River College and at National University
in Sacramento.
He can be reached via email at:
rezbook!reznick@csusac.ecs.csus.edu.
|