Search email address with pattern [at] / (at) in python

I am developing a web scraper code. The main thing I get is the email address from the HTML source. I am using the following code

 r = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
 emailAddresses =  r.findall(html) 

On several sites, the email address is in the format abcd [at] gmail.com/abcd (at) gmail.com. I need a generic regular expression code that will receive an email address in any of the three formats abcd [at] gmail.com/abcd (at) gmail.com/ abcd@gmail.com. I tried the following code but did not get the expected result. Can someone help me.

 r = re.compile(r"\b[A-Z0-9._%+-]+[@|(at)|[at]][A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
     emailAddresses =  r.findall(html)
+4
source share
2 answers

Solution: Replace @with (@|\(at\)|\[at\])as follows:

r = re.compile(r"\b[A-Z0-9._%+-]+(@|\(at\)|\[at\])[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
emailAddresses =  r.findall(html) 

: [one|two|three], . […] ([a-z] [abcd…xyz]). (one|two|three). [1]

() [], REGEX, . ( ), , , \. .?+* ..

: [dot] (dot) , .

, , , , , .

, , ( REGEX) :

REGEX (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).

( EDIT: : http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html , , REGEX!! )

[1] , (…) , , , (?:…) .

+6
 r = re.compile(r"\b[A-Z0-9._%+-]+(?:@|[(\[]at[\])])[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)

                                  ^^^^^^^^^^^^^^^^^^  
 emailAddresses =  r.findall(html) 

. .

https://regex101.com/r/nD5jY4/5#python

0

Source: https://habr.com/ru/post/1589377/


All Articles