How to parse XML with invalid characters in Node Name?

So, I'm trying to parse some XML, the creation of which is not under my control. The problem is that they have nodes that look like this:

<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(MORNINGSTAR) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(QUARTERSTAFF) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(SCYTHE) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(TRATNYR) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(TRIPLE-HEADED_FLAIL) />
<ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(WARAXE) />

Visual Studio and .NET consider that the characters '(' and ')', as indicated above, are completely invalid. Unfortunately, I need to process these files! Is there a way to get the Xml Reader classes so you don’t worry when seeing these characters or dynamically avoiding them or anything else? I could do some preprocessing of the whole file, but I want the characters "(" and ")" to be displayed inside the node in an incorrect way, so I do not want to just delete them all ...

+3
source share
2 answers

This is simply not true. Preprocessing is your best bet, perhaps with a regex - something like:

string output = Regex.Replace(input, @"(<\w+)\((\w+)\)([ >/])", "$1$2$3");

Edit: It’s a little harder to replace the “-” inside the brackets:

string output = Regex.Replace(input, @"(<\w+)\(([-\w]+)\)([ >/])",
    delegate(Match match) {
        return match.Groups[1].Value + match.Groups[2].Value.Replace('-', '_')
             + match.Groups[3].Value;
    });
+8
source

If this is not syntactically correct, this is not XML.

XML is very strict about this.

If you cannot get the sending application to send the correct XML, just let them know that any subsequent process sees that it will fail, whether it be your or another application in the future.

, Stream, . <, , . >, ( ). - , NUL ^ Z, XML . ( , < , - >).

+3

Source: https://habr.com/ru/post/1711665/


All Articles