Get text as HTML, imaplib and Django

I get emails using imaplib in Python / Django.

My goal is to read simple text and HTML messages.

I use:

mail.select('inbox', readonly=True) result, data = mail.uid('fetch', email_uid, '(RFC822)') raw_email = data[0][1] email_message = email.message_from_string(raw_email) #print "EMAIL:",email_message #print "HEADERS",email_message.items() subject = get_decoded_header(email_message['Subject']) from_address = get_decoded_header(email_message['From']) date = email_message['Date'] date = parse_date(date) body = ''+get_first_text_block(email_message) 

And the code for get_first_text_block (obtained from the Internet):

 def get_first_text_block(email_message_instance): maintype = email_message_instance.get_content_maintype() if maintype == 'multipart': for part in email_message_instance.get_payload(): if part.get_content_maintype() == 'text': return part.get_payload() elif maintype == 'text': return email_message_instance.get_payload() # In cases of emails with empty body return '' 

Now the problem is that the text is not displayed in formatted form. In particular: If it is a text email, the text appears as one large consolidated line instead of gaps, paragraphs and blank lines between lines.

If it is HTML text, HTML is not displayed at all, instead it is displayed as plain text with HTML fragments inside (even using a safe filter on Django).

I assume that something like the wrong conversion of the email payload to a string or the like may happen, but I checked everything and could not find out what might be wrong.

What am I doing wrong?

+4
source share
2 answers

The problem is that you only use the first text block for the email body. Try the following and see if it works. This is not a Django issue.

 body = email_message.get_payload()[1].get_payload() 

Try changing the index and see if you see html.

Based on this, you need to change the function to get the body of the letter.

EDIT: I assume you are looking at a multi-page post

+1
source

To extract the text version, you can use the code below. If you want the html version of the email icon to replace != 'plain' with != 'html' .

 import email resp, data = M.FETCH(1, '(RFC822)') mail = email.message_from_string(data[0][1]) for part in mail.walk(): print 'Content-Type:',part.get_content_type() print 'Main Content:',part.get_content_maintype() print 'Sub Content:',part.get_content_subtype() for part in mail.walk(): if part.get_content_maintype() == 'multipart': continue if part.get_content_subtype() != 'plain': continue payload = part.get_payload() print payload 
0
source

Source: https://habr.com/ru/post/1443725/


All Articles