Regexp for searching tags without nested tags

Question

I am trying to write regexp which will help find untranslated texts in html code.

Translated texts mean that they pass a special tag: or through a construction: $ {...}

Ex. Without translation:

<h1>Hello</h1>

Translated texts:

<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>

I wrote the following expression:

\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>

It finds the correct lines, for example:

<p>text<p>

Skips right

<a><fmt:message key="common.delete" /></a>

But also catches:

<li><p><fmt:message key="common.delete" /></p></li>

And I can't figure out how to add an exception for the strings $ {...} in this expression Can anyone help me?

+3

glaz666 Jan 6 '10 at 16:50

5 answers

, , :

<(\w+)>(?:(?!<fmt:message).)+</\1>

+1

Matteo Riva 06 . '10 17:23

,

<([^>]+)[^>]*>([^<]*)</\1>

, CDATA '<' , . XML .

0

Mike Nelson 06 . '10 17:38

:

aba

aca

abcba?

.

FSM:

Start->A->B->A->Terminate

Insert abcba and run it

Start is ready for input. 
a -> MATCH, transition to A
b -> MATCH, transition to B
c -> FAIL, return fail.

0

Paul nathan Jan 6 '10 at 18:05

also see

to discuss using regex for parsing html

executive resume: dont

0

pm100 Jan 6 '10 at 18:16

gnarf · Accepted Answer · 2010-01-06T17:40:06+0000

If I understand you correctly, you want the data inside the "tag" to not contain fmt:messsageor${....}

, ., , , ., :

/<(\w+)[^>]*>(?:(?!<fmt:message|\$\{|<\/\1>).)*<\/\1>/i

- "" , <fmt:message [^<] . - <

/<(\w+)[^>]*>(?:(?!\$\{)[^<])*<\/\1>/i

. "" , -lookahead - (?!\s*<) - ,

/<(\w+)[^>]*>(?!\s*<)(?:(?!\$\{)[^<])*<\/\1>/i