How to write a word boundary inside a character class in python without losing meaning? I want to add an underscore (_) in the word boundary definition (\ b)

I know that the definition of the word boundary (?<!\w)(?=\w)|(?<=\w)(?!\w) and I want to add an underline (optional) also in the definition of the word boundary.

One way to do this is that we can simply change the definition as a new one (_)?((?<!\w)(?=\w)|(?<=\w)(?!\w)) , but we don’t want to use an expression that is too long.

An easy approach could be: If I can write a word border inside a character class, then adding an underscore inside the character class would be very simple [\b-], but the problem is that placing \binside the character class, i.e. [\b], means backspace, not the word boundary.

please inform about the solution, that is, how to put \binside the character class without losing its original value.

+4
source share
1 answer

You can use search queries:

(?:\b|(?<=_))word(?=\b|_)
^^^^^^^^^^^^^     ^^^^^^^

See regex demo , where (?:\b|(?<=_))is a non-captivating group matching a word boundary or a location preceded by _, and (?=\b|_)is a positive result matching either a word boundary or a character _.

Unfortunately, Python rewill not allow use (?<=\b|_), since the lookbehind template must have a fixed width (otherwise you will get an error look-behind requires fixed-width pattern).

A Python demo :

import re
rx = r"(?:\b|(?<=_))word(?=\b|_)"
s = "some_word_here and a word there"
print(re.findall(rx,s))

, (?<![^\W_])/(?![^\W_]) (. -):

rx = r"(?<![^\W_])word(?![^\W_])"

lookbehind (?<![^\W_]) , , -, _ char ( char _ ) (?![^\W_]) lookahead , char, - _ char (.. char _).

+1

Source: https://habr.com/ru/post/1665701/


All Articles