Inventory of regular expression anchors

  • They say that
  • ^ matches the beginning of a line , but does not match immediately after "\n" , "\r" or "\r\n" . However, this corresponds to the beginning of the line. In what sense does this correspond to the beginning of a line and how does it differ from \A ?

  • $ is said to match the end of the line , but does not match the rule before "\n" , "\r" or "\r\n" . However, it matches the end of the line. In what sense does this correspond to the end of the line and how does it differ from \z ?

  • \z , unlike \z , matches right before "\n" if it is at the end of a line. It seems to me that \A and \z are naturally a conjugate concept, while \z rather odd. Why are \z and \z defined as is, and not vice versa? And when do you want to use \z ?

Can you illustrate this using examples? If the difference between languages ​​and standards matters, it would be useful to list them.

+4
source share
1 answer

The difference is that ^ and $ bindings can have modified behavior.

When multiline mode is on, the ^ and $ anchors multiline beginning and end of a line.

When multiline mode is disabled, the ^ and $ bindings multiline beginning and end of the line.


Most regular expression implementations have multiline mode.

With Ruby, Perl, or Javascript, it is defined using the m modifier. e.g. /pattern/m

In .NET, it is defined using (?m) inside the template itself or from the RegexOptions.Multiline enumeration.


To answer your third question ...

\A - match should occur at the beginning of the line.

\Z - the match must occur at the end of the line or before \n at the end of the line.

\Z - match should occur at the end of the line.

These three are constants that are not affected by any modifiers. I agree that \A and \Z seem like an illogical pairing. For me, this also does not matter much. But in case you might have a back-line tuple that you want to ignore, then \Z may be preferable.

+4
source

Source: https://habr.com/ru/post/1345451/


All Articles