Python 3 Email Encoding

I am working on setting up a script that sends incoming mail to the recipient list.

Here is what I have now:

I read the email from stdin (this is how postfix passes it):

email_in = sys.stdin.read() incoming = Parser().parse(email_in) sender = incoming['from'] this_address = incoming['to'] 

I am testing for multipart:

 if incoming.is_multipart(): for payload in incoming.get_payload(): # if payload.is_multipart(): ... body = payload.get_payload() else: body = incoming.get_payload(decode=True)` 

I installed the outgoing message:

 msg = MIMEMultipart() msg['Subject'] = incoming['subject'] msg['From'] = this_address msg['reply-to'] = sender msg['To'] = " foo@bar.com " msg.attach(MIMEText(body.encode('utf-8'), 'html', _charset='UTF-8')) s = smtplib.SMTP('localhost') s.send_message(msg) s.quit() 

This works well with ASCII characters (English text), forwards it all.

When I send non-ascii characters, it returns gibberish (depending on bytes of the mail client or ascf-representations of utf-8 characters)

What could be the problem? Is it on the inbound or outbound side?

+5
source share
1 answer

The problem is that many email clients (including Gmail) send non-ascii emails to base64. stdin , on the other hand, passes everything to a string. If you Parser.parse() this with Parser.parse() , it will return the type of the string with base 64 inside.

Instead, use the optional decode argument instead of get_payload() . When this is set, the method returns the type of bytes. After that, you can use the built-in decode() method to get the utf-8 string as follows:

 body = payload.get_payload(decode=True) body = body.decode('utf-8') 

There is a great understanding of utf-8 and python in the Ned Batchelder talk.

My last code works a little differently, you can also check that out here.

+3
source

Source: https://habr.com/ru/post/1207126/


All Articles