Improve regex for searching @username tags

Im using the system to get @twitter as names, and the following regex is almost perfect:

(?<![^\s<>])@([^\s<>]+) 

The problem I discovered is the punctuation marks after the name

So for example:

  • Hi @mark ===> matches @mark (This is what we want)
  • Hi @mark. ===> match @mark.
  • Hi @mark, you're good ===> match @mark,
  • Hi @mark !!!! I did not think about it ===> matches @mark !!!!

Obviously, we only want to combine the username, not the punctuation marks. The bottom line is that some usernames have this period inside the username, for example

For example, these are all legal user names.

mark.markus

mark@gmail.com

mark_markus@gmail.com

EDIT We use lookbehind, if the above usernames are used with @infront of them, they must match, but without the @ in front, the email address should actually not match. @mark_markus @ gmail.com must match mark_markus@gmail.com , but if someone typed the plain old mark_markus@gmail.com , we don’t want gmail.com to match.

Any ideas on how to change the regular expression to account for the various punctuation marks that can be used?

+4
source share
1 answer

how about this:

 (?<![\ w@ ])@([\ w@ ]+(?:[.!][\ w@ ]+)*) 

I replaced [^\s<>] with [\ w@ ] , which is a bit more restrictive. \w matches letters, numbers, and underscores. If there are other characters that you need to allow, add them to each character class.

This group: (?:\.\w+)* Allows one or more periods to be part of a username, but only if they are immediately followed by words. Please note that (?:...) is not an exciting group. This is useful when you want to group things together for logical purposes, but do not need to record the result.

Update: see working example .

+4
source

Source: https://habr.com/ru/post/1469549/


All Articles