What is an effective way to find a pattern in large text?

I want to extract email addresses from a large text file. What is the best way to do this?

My idea is to find the '@' in the text and use "Regex" to find the email address in the substring (for example) 256 characters before this position and 512 in length.

PS: Right, I want to know the best and most effective way to find some kind of template (for example, email addresses) in a huge text.

+3
source share
4 answers

256 and 512 sound like arbitrary values.

  • @, , , (, @ , , ...)
  • :

64 255 .

, .

, .

+1

, . , . , , , , . , , , , .

, , , :

[A-Za-z0-9!#$%&*+-=?^_~]{1,64}@[A-Za-z0-9-.]{1,255}\.[A-Z]{2,6}
+1

, , .

, @ , , , , :

  • @
  • @, , ASCII.
  • , .
  • @ , , ,
+1

, RFC . , - :

/(?<=^|[\s<(\["'])[a-z][\w.+-]+@[\w-]+(?:\.[\w-]+)+(?=[>)\]"']|$)/gi

, :

  • - , , ( @, ). + .
  • , , , /

, ( , RFC). , .

0
source

Source: https://habr.com/ru/post/1745053/


All Articles