7 bit or 8 bit content transfer encoding

Question

7 bit or 8 bit content transfer encoding

When sending email content, you must set the "Copy Content" header. I noticed a lot of email headers that I received. Some emails using 7bit, and some using 8bit.

What is the difference between the two? What is recommended? Is there a special encoding needed for the email body to set these headers?

+47

email encoding header transfer

mahi Sep 07 '14 at 13:17

source share

1 answer

Craig Walker · Answer 1 · 2015-02-15 22:03

It may be a little tight to read, but the "Content-Transfer-Encoding" section of RFC 1341 contains all the details:

http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html

The situation seems to go from bad to worse. Here is my resume:

Background

SMTP, by definition (RFC 821), restricts mail to 1000-character lines of 7 bits. This means that none of the bytes you send on the channel can have the most significant ("highest") bit set to "1".

The content we want to submit often does not obey this restriction inherently. Think of an image file or a text file that contains Unicode characters: the bytes of these files will often have their 8th bit set to “1”. SMTP does not allow this, so you need to use “transfer coding” to describe how you worked around the mismatch.

The Content-Transfer-Encoding header values describe the rule that you selected to solve this problem.

7Bit Encoding

7bit simply means: "My data consists only of US-ASCII characters that use only the lower 7 bits for each character." You basically guarantee that all bytes of your content already comply with SMTP restrictions and therefore do not require special treatment. You can just read it as is.

Please note that when you select 7bit you agree that all lines in your content are less than 1000 characters long.

As long as your content adheres to this rule, 7bit is the best encoding of the transfer, since there is no need for additional work; you just read / write bytes when they exit the pipe. It also easily illuminates and understands content. The idea here is that if you just write in “plain English text,” everything will be fine. But what was not true in 2005 , and today it is not.

8Bit Encoding

8bit means: "My data can contain extended ASCII characters, they can use the 8th (most significant) bit to indicate special characters outside of the standard 7-bit US-ASCII characters." As with 7bit , there is a limit of 1000 characters.

8bit , just like 7bit , does not actually do any byte conversion, as they are written or read from the wire. It simply means that you cannot guarantee that none of the bytes will have the most significant bit set to "1".

This is similar to the step from 7bit , as it gives you more freedom in your content. However, RFC 1341 contains this tidbit:

Since the publication of this document, there are no standardized Internet transfers for which it is legal to include unencrypted 8-bit or binary data in mail services. Thus, there are no conditions under which “8-bit” or “binary” encoding of a content transfer is actually legal on the Internet.

RFC 1341 came out more than 20 years ago. Since then, we have received 8bit MIME Extensions in RFC 6152 , but even then the limit limits can still apply:

Please note that this extension does NOT preclude the possibility of limiting the line length of the SMTP server; servers can implement this extension, but nevertheless set a string length limit of at least 1000 octets.

Binary coding

binary same as 8bit , except that there is no string length limit. You can still include any characters you want, and there is no extra encoding. Like 8bit , RFC 1341 states that this is indeed not a legitimate encoding transfer encoding. RFC 3030 extended this with BINARYMIME .

Print quotes

Prior to the 8BITMIME extension, there should be a way to send content that cannot be 7bit over SMTP. Good examples are HTML files (which can contain more than 1000 characters) and files with international characters. The quoted-printable encoding (defined in section 5.1 of RFC 1341) is designed to handle this. It performs two functions:

Defines how to avoid characters other than US-ASCII so that they can only be represented in 7-bit characters. (Short version: they are displayed as an equal sign plus two 7-bit characters.)
Specifies that lines will contain no more than 76 characters, and line breaks will be represented using special characters (which are then escaped).

Quoted Printable, due to shielding and short lines, is much harder to read by humans than 7bit or 8bit , but it supports a much wider range of possible content.

Base64 Encoding

If your data is mostly non-textual (for example: image file), you do not have many options. 7bit is out of the table. 8bit and binary not supported until the MIME RFC extensions. quoted-printable will work, but really inefficient (each byte will be represented by 3 characters).

base64 is a good solution for this type of data. It encodes 3 raw bytes in the form of 4 US-ASCII characters, which is relatively efficient. RFC 1341 additionally limits the string length of base64 encoded data to 76 characters to match in an SMTP message, but this is relatively simple control when you simply split or concatenate arbitrary characters with fixed lengths.

The big drawback is that base64 encoded data is almost completely unreadable by people, even if it's just “plain” text below.

7 bit or 8 bit content transfer encoding

Background

7Bit Encoding

8Bit Encoding

Binary coding

Print quotes

Base64 Encoding

More articles: