Method for parsing a Cc email header text field?

I have plain Cc header field text that looks like this:

friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >

Are there any battle-tested modules for proper parsing?

(bonus if it's in python! the email module just returns the raw text without any separation methods, AFAIK) (also a bonus if it breaks the name and address into fields)

+4
source share
4 answers

There are many functions available as a standard python module, but I think you're looking for email.utils.parseaddr () or email.utils.getaddresses ()

 >>> addresses = ' friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >' >>> email.utils.getaddresses([addresses]) [('', ' friend@email.com '), ('John Smith', ' john.smith@email.com '), ('Smith, Jane', ' jane.smith@uconn.edu ')] 
+14
source

I have not used it myself, but it seems to me that you can easily use the csv package for parsing data.

0
source

A jump is completely unnecessary. I wrote this before realizing that you can pass getaddresses() list containing a single line containing several addresses.

I did not have the opportunity to look at the specifications for the addresses in the email headers, but based on the line you provided, this code should do the job, breaking it into a list, making sure to ignore the commas if they are inside the quote (and, therefore, part of the name).

 from email.utils import getaddresses addrstring = ', friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >,' def addrparser(addrstring): addrlist = [''] quoted = False # ignore comma at beginning or end addrstring = addrstring.strip(',') for char in addrstring: if char == '"': # toggle quoted mode quoted = not quoted addrlist[-1] += char # a comma outside of quotes means a new address elif char == ',' and not quoted: addrlist.append('') # anything else is the next letter of the current address else: addrlist[-1] += char return getaddresses(addrlist) print addrparser(addrstring) 

It gives:

 [('', ' friend@email.com '), ('John Smith', ' john.smith@email.com '), ('Smith, Jane', ' jane.smith@uconn.edu ')] 

I would be interested to see how other people will solve this problem!

0
source

Converting multiple lines of email to a dictionary (multiple E-Mail with a name in one line).

 emailstring = 'Friends < friend@email.com >, John Smith < john.smith@email.com >,"Smith" < jane.smith@uconn.edu >' 

Separate a line with a comma

email_list = emailstring.split(',')

the name is the key, and the email is the meaning and makes the dictionary.

 email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list)) 

Result:

 {'John Smith': ' john.smith@email.com ', 'Friends': ' friend@email.com ', 'Smith': ' jane.smith@uconn.edu '} 

Note:

If there is the same name with a different email identifier, then one entry is skipped.

 'Friends < friend@email.com >, John Smith < john.smith@email.com >,"Smith" < jane.smith@uconn.edu >, Friends < friend_co@email.com >' 

Friends are duplicated 2 times.

0
source

Source: https://habr.com/ru/post/1345169/


All Articles