What a taste of regex uses git

I am trying to use the git diff -word-diff-regex = command and it seems to reject any types of lookaheads and lookbehinds. I'm having trouble setting up what regex git uses. for instance

git diff --word-diff-regex='([.\w]+)(?!>)' 

Returned as invalid regular expression.

I am trying to get all words that are not HTML tags. So the resulting regular expression matches should be 'Hello' 'World' 'Foo' 'Bar for the line below

 <p> Hello World </p><p> Foo Bar </p> 
+5
source share
1 answer

The Git source uses regcomp and regexec , which are defined by POSIX 1003.2. Code for compiling diff regexp :

  if (regcomp(ecbdata->diff_words->word_regex, o->word_regex, REG_EXTENDED | REG_NEWLINE)) 

which in POSIX means that these are "extended" regular expressions, as defined here .

(Not every C library actually implements the same POSIX REG_EXTENDED . Git includes its own implementation, which can be built in place of the system.)

Edit (in updated question): EOS POSIX have neither look nor lookbehind, and they do not have \w (but [_[:alnum:]] is probably close enough for most purposes).

+3
source

Source: https://habr.com/ru/post/1257514/


All Articles