Beginning and end of words in sed and grep

I do not understand the difference between \b and \< in GNU sed and GNU grep. It seems to me that \b can always replace \< and \\> without changing the set of matching lines.

In particular, I am trying to find examples in which \bsomething and \\< something do not match exactly the same lines.

Same question for something\b and something\\> .

thanks

+6
source share
4 answers

I suspect that this rarely matters if you use (more general) \b or (more specific) \< and \> , but I can come up with an example where it will be. This is pretty far-fetched, and I suspect that in most cases the use of regular expressions in the real world will not be affected, but it should demonstrate that it can at least make a difference in some cases.

If I have the following text:

 this is his pig 

and I want to know if /\bis\b/ matches, it doesn’t matter if I used /\<is\>/ instead or /\>is\</

But what if my text was instead

 is this his pig 

Now before the word "is" is no longer the word-final boundary, but only the word-initial boundary. Using /\bis\b/ appropriate, and of course /\<is\>/ too, but /\>is\</ does not work.

In real life, however, I think it’s not that you really need to be able to make this distinction, so (at least outside sed) \b is a regular word boundary marker for regular expressions.

+9
source

\< corresponds to the transition from non-word to word.

\> corresponds to the transition from word to non-word.

\b equivalent (\<|\>) in the extended regular expression.

Therefore, I will not say that \b and \< match. I would say that \b is a superset of \< . And vice versa, for \b and \> .

+6
source

According to LinuxTopia , the only difference between the two types of word boundaries is that while \< and \> work in most sed versions; last \b only works if your system uses gsed

And a quote from the wiki:

These characters include '\ <' and '>' (gsed, ssed, sed15, sed16, sedmod) and '\ b' and '\ B' (gsed only).

In addition, they are identical. Also here is a table that explains all the possible scenarios for using word boundaries:

  Match position Possible word boundaries HHsed GNU sed --------------------------------------------------------------- start of word [nonword char]^[word char] \< \< or \b end of word [word char]^[nonword char] \> \> or \b middle of word [word char]^[word char] none \B outside of word [nonword char]^[nonword char] none \B --------------------------------------------------------------- 
+1
source

I came across such an example before.
\ <. \> matches a single-letter word.
Using \ b, you need to put something like \ b [^] \ b, because \ b. \ B matches the space between the two words.

+1
source

Source: https://habr.com/ru/post/948383/


All Articles