Method for parsing a Cc email header text field?

Question

Method for parsing a Cc email header text field?

I have plain Cc header field text that looks like this:

friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >

Are there any battle-tested modules for proper parsing?

(bonus if it's in python! the email module just returns the raw text without any separation methods, AFAIK) (also a bonus if it breaks the name and address into fields)

+4

python email parsing email-headers

smurthas Mar 24 '11 at 23:22

source share

4 answers

I have not used it myself, but it seems to me that you can easily use the csv package for parsing data.

0

Demian brecht Mar 24 '11 at 23:34

source share

A jump is completely unnecessary. I wrote this before realizing that you can pass getaddresses() list containing a single line containing several addresses.

I did not have the opportunity to look at the specifications for the addresses in the email headers, but based on the line you provided, this code should do the job, breaking it into a list, making sure to ignore the commas if they are inside the quote (and, therefore, part of the name).

 from email.utils import getaddresses addrstring = ', friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >,' def addrparser(addrstring): addrlist = [''] quoted = False # ignore comma at beginning or end addrstring = addrstring.strip(',') for char in addrstring: if char == '"': # toggle quoted mode quoted = not quoted addrlist[-1] += char # a comma outside of quotes means a new address elif char == ',' and not quoted: addrlist.append('') # anything else is the next letter of the current address else: addrlist[-1] += char return getaddresses(addrlist) print addrparser(addrstring)

It gives:

 [('', ' friend@email.com '), ('John Smith', ' john.smith@email.com '), ('Smith, Jane', ' jane.smith@uconn.edu ')]

I would be interested to see how other people will solve this problem!

0

Acorn Mar 25 '11 at 2:58

source share

Converting multiple lines of email to a dictionary (multiple E-Mail with a name in one line).

 emailstring = 'Friends < friend@email.com >, John Smith < john.smith@email.com >,"Smith" < jane.smith@uconn.edu >'

Separate a line with a comma

email_list = emailstring.split(',')

the name is the key, and the email is the meaning and makes the dictionary.

 email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result:

 {'John Smith': ' john.smith@email.com ', 'Friends': ' friend@email.com ', 'Smith': ' jane.smith@uconn.edu '}

Note:

If there is the same name with a different email identifier, then one entry is skipped.

 'Friends < friend@email.com >, John Smith < john.smith@email.com >,"Smith" < jane.smith@uconn.edu >, Friends < friend_co@email.com >'

Friends are duplicated 2 times.

0

Gaurav panchal Jul 21 '15 at 7:28

source share

Carpetsmoker · Accepted Answer · 2011-03-24T23:35:01+0000

There are many functions available as a standard python module, but I think you're looking for email.utils.parseaddr () or email.utils.getaddresses ()

 >>> addresses = ' friend@email.com , John Smith < john.smith@email.com >,"Smith, Jane" < jane.smith@uconn.edu >' >>> email.utils.getaddresses([addresses]) [('', ' friend@email.com '), ('John Smith', ' john.smith@email.com '), ('Smith, Jane', ' jane.smith@uconn.edu ')]

Method for parsing a Cc email header text field?

More articles: