Regexp for searching tags without nested tags

I am trying to write regexp which will help find untranslated texts in html code.

Translated texts mean that they pass a special tag: or through a construction: $ {...}

Ex. Without translation:

<h1>Hello</h1>

Translated texts:

<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>

I wrote the following expression:

\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>

It finds the correct lines, for example:

<p>text<p>

Skips right

<a><fmt:message key="common.delete" /></a>

But also catches:

<li><p><fmt:message key="common.delete" /></p></li>

And I can't figure out how to add an exception for the strings $ {...} in this expression Can anyone help me?

+3
source share
5 answers

If I understand you correctly, you want the data inside the "tag" to not contain fmt:messsageor${....}

, ., , , ., :

/<(\w+)[^>]*>(?:(?!<fmt:message|\$\{|<\/\1>).)*<\/\1>/i

- "" , <fmt:message [^<] . - <

/<(\w+)[^>]*>(?:(?!\$\{)[^<])*<\/\1>/i

. "" , -lookahead - (?!\s*<) - ,

/<(\w+)[^>]*>(?!\s*<)(?:(?!\$\{)[^<])*<\/\1>/i
+2

, , :

<(\w+)>(?:(?!<fmt:message).)+</\1>
+1

,

<([^>]+)[^>]*>([^<]*)</\1>

, CDATA '<' , . XML .

0

:

aba

aca

abcba?

.

FSM:

Start->A->B->A->Terminate

Insert abcba and run it

Start is ready for input. 
a -> MATCH, transition to A
b -> MATCH, transition to B
c -> FAIL, return fail.
0
source

also see

http://www.codinghorror.com/blog/archives/001311.html

to discuss using regex for parsing html

executive resume: dont

0
source

Source: https://habr.com/ru/post/1727512/


All Articles