Why was \ b introduced while matching string strings in regular expressions?

I see that there is a \b that I have never used, and I was wondering if anyone could give me use cases when it is impossible to do without \b .

+4
source share
3 answers

They correspond to each other - \s corresponds to spaces, \b word boundaries.

One good example is a symbol . .

In the line hello.hi :

\s will not match . but \b will match before and after it.

+2
source

I was wondering if anyone could give me use cases when it is impossible to do without \ b.

The expression \b is just a convenient shorthand for what you can already do with other constructs.

For example, if your regex engine has lookarounds, then \b equivalent to the following longer expression:

 (?<=\w)(?!\w)|(?<!\w)(?=\w) 

Similar to \w , \d , etc. they simply cut back on what can already be done using character classes such as [A-Za-z0-9_] or [0-9] . Usually you want to use the short version, because each time the full definition is cumbersome, difficult to read, and increases the risk of error.

+6
source

These are completely different things.

\s is a space character . This means that it is a shortcut for a predefined character class that contains whitespace characters such as \t, \r, \n or space. \s matches one of these characters.

\b is the word boundary . This statement is zero width and is associated with the predefined character class \w . The statement of zero width means that it has a width of 0, that is, it does not match the character. It corresponds to the position that performs the statement. An affirmation here would be a symbol of the word on the one hand and a non-main symbol on the other. Mark provided an already long version of \b and an Oded example where \b would match .

\w is the < word "character, means that it contains something like [a-zA-Z0-9_] . In some languages ​​it is based on Unicode and contains all the letters.

+1
source

Source: https://habr.com/ru/post/1445994/


All Articles