Regular expression to clear XML

I have to deal with XML data that sometimes contains an unshielded ampersand, and I cannot force the producer to either run away from it or put it in the CDATA section.

Now I am looking for a regular expression to replace and with and if its not part of the object. Something like this: & (?! (Amp | ap | | lt | gt);)

Unfortunately, my programming environment only supports POSIX 1003.2 extended regular expressions (see http://www.kernel.org/doc/man-pages/online/pages/man7/regex.7.html ), which do not seem to be have the operator "!" necessary here.

Any ideas on how to create the required regular expression?

+3
source share
2 answers

Lateral thinking: replace everything &with &, and then replace everything &apos(etc.) with &apos(for example)? You can use a group to capture the returned part -&(apos)

+4
source

Instead of looking for something matching a negative regex, you can search for something NOT matching a positive regular expression, for example:

! ... &(?(amp|apos|quot|lt|gt);)

I have not read the entire page you linked to, but I'm sure it should be possible.

0
source

Source: https://habr.com/ru/post/1780641/


All Articles