Mime encoded headers with optional '=' (==? Utf-8? B? Base64string? =)

It might be a stupid question, but ... here it is!

I wrote my own MIME parser in my native C ++. This is a nightmare with encodings! It has been stable for the last 3 months or so, but I recently noticed this Subject: header .

 Subject: =?UTF-8?B?T2ZpY2luYSBkZSBJbmZvcm1hY2nDs24sIEluaWNpYXRpdmFzIHkgUmVjbGFt?===?UTF-8?B?YWNpb25lcw==?= 

which should decode:

 Subject: Oficina de Información, Iniciativas y Reclamaciones 

The problem is that there is one additional = (equal) in which I cannot understand the bindings of two (why 2?) Encoded elements, which I do not understand why they are separated. Theoretically, the format should be: =?charset?encoding?encoded_string?= , But another object was found starting with two = .

 ==?UTF-8?B?blahblahlblah?= 

How do I handle extra = ?

Could I replace ==? on =? (it's me) before doing anything (and it works) ... but I wonder if there is any specification regarding this so that I don't chop my way into the correct functionality.

PS : How much I hate these relic protocols! All text messages must be UTF-8 and XML :)

+6
source share
3 answers

MIME headers use coded words (RFC 2047, section 2.).

... (why 2?)

To overcome the restriction on 75 coded words, which exists due to the restriction on the length of the line 78 (or use 2 different encodings, for example, Chinese and Polish).

RFC 2047:

A “coded word” can contain up to 75 characters, including “encoding”, “encoding”, “encoded text” and delimiters. If it is desired to encode more text than fits into a 75-character "encoded word", several "encoded words" (separated by a CRLF space).

Here is an example from RFC2047 (note that there is no '=' between them):

 Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?= 

Your topic should be decrypted as:

 "Oficina de Información, Iniciativas y Reclam=aciones" 
Answer

mraq is invalid . Soft line breaks apply only to Content Quotes Printable encoding, which can be used in the body of MIME.

+2
source

From what I see in MIME RFC , double equal signs are not valid for input (for encoding), but keep in mind that you can interpret the first equal sign as what it is, and then use the following material for decoding. But seriously, these extra equal characters look like artifacts, possibly from the wrong encoder.

0
source

It is called " Soft Line Break " and it is a legacy of the SMTP protocol.

Quote p. 20 of RFC2045

(Soft line breaks) Quoted print REQUIRES that encoded lines be no more than 76 long characters. If long lines should be encoded with Quoted-Printable encoding, soft line breaks should be used. An equal sign as the last character on an encoded line indicates such a slight (“soft”) line break in the encoded text.

And also Wikipedia on quotes

A soft line break consists of "=" at the end of a coded string, and does not appear as a line break in decoded text.

0
source

Source: https://habr.com/ru/post/947236/


All Articles