Article

How Spammers Do It

Michael Schwager

Spammers need addresses. Any way they can get them, they will. One way spammers get addresses is by sending robots across the Web to scan Web pages for strings that look like email addresses. Until recently, they trolled the Internic's list of domain name registrants, but the Internic clamped down on that practice. (Within a week of registering my domain schwager.com I had received Spam, and I barely had time to email friends announcing my new mail address!). Another easy technique is to scan all the files from the day's Usenet postings. Any part of the posting is fair game. Headers or contents can both contain valid email addresses, so they're both scanned. Once an address is obtained, it is likely to be sold to another list, and thus spread throughout the Internet.

Getting your address is but a small part of the procedure. Because of the way the SMTP works, it's easy for spammers to send lots and lots of email very quickly. Let's for a moment pretend we're spammers. We've created a site containing hard-core pornography, and we want to tell the world about our service. We'll call our company Mike's House o' Porn, and we want to encourage people to visit www.mhop.com.

First, we create a form letter. We can put all of this text in a single file. Note that there is nothing special about the first part of this file, known as the message header section. A single blank line separates it from the body:

   To: 123456789@public.com
   From: mike@www.mhop.com
   Subject: Lotsa Sex!

   Wow, get a load of this XXX action: 
   come to www.houseoporn.com!
   Our ladies want to please YOU

   -Mike's House o' Porn
   P.S. If this email offends you in 
   any way, simply send us email with
   REMOVE in the subject line.  You 
   will be removed from our mailing list.

Now before we send this, we realize that almost all of our recipients will not care to read our mail. Some of them will be angry, and some of the mail will bounce. That's not our concern; we want to broadcast, not receive. And since some sites will likely try to stop us, we will need to be a little tricky to get by their filters. A knowledgeable spammer will have a number of problems with this spam: The To: address has some recognizable strings in it that many people will try to block. For example, the all-number sequence is used mostly by spammers. The "public.com" site is used by at least one popular bulk mail package that is sold and used today, and will likely be blocked. We change it to:

To: mybuddy@recipienthost.com

Next, the From: address is real! Are we crazy? If the Internet community finds out that this address or hostname works, we'll get flooded with angry mail! Best to fake that one, too:

From: hotchicks@waitingforu.com

Also, the Subject: line is far too descriptive. People who aren't interested in porn will be able to delete the message right away. That will never do, so we change it to something that'll make them want to read it:

Subject: Did I get the right email address?

Hmmm - more problems. Many people have procmail filters or filters on their email client (such as Pegasus or Netscape mail) that flag strings like "XXX" or "sex". Some even have filters that look for "remove in the subject line" (all string searches are case-insensitive). Best to remove the P.S. (we were lying anyway). That way we'll be better able to put our message before those people who obviously don't want it.

So, now our spam is better. It's more likely to get past filters and devices used to try and stop us. Here's what we have:

   To: mybuddy@recipienthost.com
   From: hotchicks@waitingforu.com
   Subject: Did I get the right email address?

   Wow, get a load of this action:  come to www.houseoporn.com!
   Our ladies want to please YOU

   -Mike's House o' Porn

Sending the Message

Now, how to send the message? First, we buy a list of thousands or even millions of email addresses from another spammer. Next, we generate the mail. Recall that a typical SMTP transaction goes like this (see RFC 821):

Client connects to host on port 25. Host says hello, client responds with ehlo (if it uses extended SMTP commands) or helo. Then,

The mail is sent like this:

client:  MAIL FROM: some_address@somewhere.com
host:    250 some_address@somewhere.com... Sender ok
client:  RCPT TO: valid_recipient@valid_host.com
host:    250 valid_recipient@valid_host.com... Recipient ok
client:  DATA
host:    354 Enter mail, end with "." on a line by itself
client:  sends its message, including headers and body.  The last line
         is comprised of a single '.' .
host:    250 Message accepted for delivery

There's something very important to note here: The only place a valid email address is required for delivery of this message, from the perspective of the mail delivery software (better known as an MTA or mail transfer agent), is in the "RCPT TO:" part of the SMTP conversation. None of the other addresses is necessarily real: not the MAIL FROM:, not the From: line in the headers, not the To: line, not the Reply-to:, none of it. This is key to understanding how the spammer is able to blast out so much email at virtually no cost to himself. He simply creates a form letter with whatever header information he wants. The recipient's "envelope" address determines where the message will go.

The envelope addresses are those addresses that appear in the SMTP conversation above, and while they are often the same as those that appear in the message headers, they don't have to be. As a matter of fact, they technically don't have anything to do with each other! It's important to think about the envelope and the message separately. The envelope address (your email address) does not show up in the message in your mail program!

Suppose we're spammers and we need to send spam. A popular method, given that we have our list of addresses and we have a form letter, is to "hijack" an unsuspecting mail host on the Internet. Spammers do this by connecting to an unsuspecting host and using it to deliver mail. That is, we (as spammers) use our list of email addresses to fill in multiple "rcpt to:" lines during the SMTP conversation. Thus, some poor Internet Service Provider (ISP) does our mail delivering for us. They also have to deal with bounced emails, undeliverables, and other problems - not to mention the load on their server. Here is how the SMTP conversation will look:

us:  MAIL FROM: hotchicks@waitingforu.com 
ISP:     250 hotchicks@waitingforu.com... Sender ok 
 
loop: 
 
us:      RCPT TO: valid_recipient@valid_host.com 
ISP:     250 valid_recipient@valid_host.com... Recipient ok 
 
... loop, thousands more times ... 
 
us:      DATA 
ISP:     354 Enter mail, end with "." on a line by itself 
us:      send spam form letter 
host:    250 Message accepted for delivery

Voila! Spam-o-matic mail delivery! Scratch one ISP for a day, as the hapless administrators deal with complaints, loss of service, mail denies from angry downstream hosts, etc.

The technique described is known as "relaying". As more sites get smart about relaying, it will become less popular. Instead, spammers will need to resort to spam-friendly domains, or they will need to sign up with an ISP to send their spam. The ISP will disable the account once they receive complaints from the 'Net about this spammer. The spammer will simply move on to another. A $20 dialup account is cheap for sending thousands upon thousands of spam messages.

About the Author

Mike Schwager is a contractor specializing in UNIX and the Internet. He has spent the past 15 years writing C and Perl code, shell scripts, and maintaining systems in the corporate and educational environments. Email him at Michael@Schwager.com or visit http://come.to/lanicservices.