Tuesday, 19 March, 2002

Formatting Email Messages

Otto von Bismarck once said:  "There are two things you should never watch being made:  laws and sausages."  Or perhaps it was: "Laws are like sausages, it is better not to see them being made."  Or maybe: "If you like laws and sausages, you should never watch either one being made."  A quick scan of the Web gave me all three quotes.  Fortunately they were all credited to Otto von Bismarck.  Regardless of the exact wording, the sentiment is the same:  close examination of a thing's creation may change your perceptions of it for the worse.

If you want to continue feeling good about your electronic mail, don't look under the hood.  If you're comfortable in your ignorance, just skip the rest of this entry.  You do not want to know what happens to your message between the time it leaves your computer and the time it arrives at its destination.

I'm not going to bore you with a detailed description of the Simple Mail Transport Protocol (SMTP) and the format of Internet mail messages.  If you're interested in the details, take a look at RFC2821, RFC2822, and the documents that they reference.  Be sure to have a pot of coffee ready, though.  These RFCs make for some dry reading.  I'll just mention that when you send an email message to somebody else, it's quite possible that what is received may not be exactly what you sent.  Oh, the content will be there, but the formatting might not be what you expected.

You see, SMTP requires that Mail Transport Agents (MTAs—the server-based programs that actually deliver the mail) support the standards defined by the original documents in 1982 when the system was designed.  One of the restrictions is that plain text messages may not contain lines that exceed 78 characters.  So when you send a message that does contain longer lines, either your email client, or sometimes the MTA, will reformat that message before passing it on.  The message is encoded with special characters so that the final recipient (your recipient's email program) can decode it and, in theory, display it exactly as you had intended.  Surprisingly enough, this works.  Most of the time.  You'll know when it doesn't work:  you'll end up with text that reads similar to this:

This is some sample text that I had formatted for 80-character lines. I 
it via email, where it was "encoded" to meet the 78-characters-per-
restriction, but got mangled along the way. I'm sure you've seen
similar before.

It's easy to blame the MTAs or the email clients for not getting things right, but the real blame lies squarely with the standards.  There's simply no reason to continue supporting a standards that were created to carry low volumes of mail across networks controlled by computers that weren't even as powerful as my telephone is today.  The 78-character line limit is especially ridiculous.  That's an artifact of teletypes, for crying out loud!  We need a newer, more robust standard that is designed for today's computers and takes into account the expected growth in volume and processing power.  Ditch those 20-year-old MTAs, stop supporting the myriad special networks and routing rules, and build an electronic mail system that will deliver messages exactly as they are intended.  While we're at it, perhaps we can build in some spam protection as well.