Beginning and end of words in sed and grep

Question

Beginning and end of words in sed and grep

I do not understand the difference between \b and \< in GNU sed and GNU grep. It seems to me that \b can always replace \< and \\> without changing the set of matching lines.

In particular, I am trying to find examples in which \bsomething and \\< something do not match exactly the same lines.

Same question for something\b and something\\> .

thanks

+6

regex sed

anilomjf Jun 29 '13 at 16:22

source share

4 answers

\< corresponds to the transition from non-word to word.

\> corresponds to the transition from word to non-word.

\b equivalent (\<|\>) in the extended regular expression.

Therefore, I will not say that \b and \< match. I would say that \b is a superset of \< . And vice versa, for \b and \> .

+6

doubledown Jun 29 '13 at 17:49

source share

According to LinuxTopia , the only difference between the two types of word boundaries is that while \< and \> work in most sed versions; last \b only works if your system uses gsed

And a quote from the wiki:

These characters include '\ <' and '>' (gsed, ssed, sed15, sed16, sedmod) and '\ b' and '\ B' (gsed only).

In addition, they are identical. Also here is a table that explains all the possible scenarios for using word boundaries:

  Match position Possible word boundaries HHsed GNU sed --------------------------------------------------------------- start of word [nonword char]^[word char] \< \< or \b end of word [word char]^[nonword char] \> \> or \b middle of word [word char]^[word char] none \B outside of word [nonword char]^[nonword char] none \B ---------------------------------------------------------------

+1

Bogdan emil mariesan Jun 29 '13 at 16:33

source share

I came across such an example before.
\ <. \> matches a single-letter word.
Using \ b, you need to put something like \ b [^] \ b, because \ b. \ B matches the space between the two words.

+1

Florian bourse Jan 20 '16 at 8:20

source share

iconoclast · Accepted Answer · 2014-08-25T17:26:48+0000

I suspect that this rarely matters if you use (more general) \b or (more specific) \< and \> , but I can come up with an example where it will be. This is pretty far-fetched, and I suspect that in most cases the use of regular expressions in the real world will not be affected, but it should demonstrate that it can at least make a difference in some cases.

If I have the following text:

 this is his pig

and I want to know if /\bis\b/ matches, it doesn’t matter if I used /\<is\>/ instead or /\>is\</

But what if my text was instead

 is this his pig

Now before the word "is" is no longer the word-final boundary, but only the word-initial boundary. Using /\bis\b/ appropriate, and of course /\<is\>/ too, but /\>is\</ does not work.

In real life, however, I think it’s not that you really need to be able to make this distinction, so (at least outside sed) \b is a regular word boundary marker for regular expressions.

Beginning and end of words in sed and grep

More articles: