Regex to match part of a string when the match does not contain a specific string - PCRE grep

I use TextWrangler grep to do a search / replace in multiple files and run into the wall with the last find / replace I need to do. I need to match any text between ">and the first instance <br />in the string, but the match cannot contain the sequence of characters [xcol]. A regular Perl-Compatible flavor (PCRE), so lookbehind should be a fixed length.

Example text for search:

<p class="x03">FooBar<br />Bar</p>
<p class="x03">FooBar [xcol]<br />Bar</p>
<p class="x06">Hello World<br />[xcol]foo[xcol]bar<br /></p>
<p class="x07">Hello World[xcol]<br />[xcol]foo[xcol]bar<br /></p>  

Desired regex behavior:
1st line matches "> FooBar <br />
2nd line no match
Third line matches "> Hello World <br />
4th line doesn't match

The text between ">and <br />will be written to the group that will be used with the replace function. The closest I got was to use the following regex with a negative representation, but this would not match the 3rd line as desired:

">((?!.*?\[xcol]).*?)<br />

Any help or advice is appreciated. Thank.

+3
source share
1 answer

Try this regex:

">((?!\[xcol]).)*<br\s*/>

A (short) explanation:

">               # match '">'
(                # start group 1
  (?!\[xcol]).   #   if '[xcol]' can't be seen ahead, match any character (except line breaks)
)                # end group 1
*                # repeat group 1 zero or more times
<br\s*/>         # match '<br />'

., DOT-ALL ( (?s) .), . - [\s\S]

+3

Source: https://habr.com/ru/post/1783711/


All Articles