LINQ to XML ignores line breaks in attributes

In accordance with this question:

Are line breaks allowed in XML attribute values?

Line breaks in XML attributes are absolutely valid (although perhaps not recommended):

<xmltag1> <xmltag2 attrib="line 1 line 2 line 3"> </xmltag2> </xmltag1> 

When I parse such XML using LINQ to XML ( System.Xml.Linq ), these line breaks are converted without spaces to characters. ' '

Can the XDocument.Load() parser be described to preserve these line breaks?

PS: The XML parsing that I am processing is written by third-party software, so I cannot change the way the lines are written.

+6
source share
3 answers

If you want to keep line breaks in attribute values, you need to write them with symbolic links, for example.

 <foo bar="Line 1.&#10;Line 2.&#10;Line3."/> 

since another wise XML parser will normalize them in space, according to the XML specification http://www.w3.org/TR/xml/#AVNormalize .

[edit] If you want to avoid normalizing the attribute value, then loading the XML using the old XmlTextReader helps:

  string testXml = @"<foo bar=""Line 1. Line 2. Line 3.""/>"; XDocument test; using (XmlTextReader xtr = new XmlTextReader(new StringReader(testXml))) { xtr.Normalization = false; test = XDocument.Load(xtr); } Console.WriteLine("|{0}|", test.Root.Attribute("bar").Value); 

It outputs

 |Line 1. Line 2. Line 3.| 
+9
source

line breaks are not spaces in the analysis (not ASCII code 32), if you go through each letter, you will see that the “space” is ASCII code 10 = LF (LineFeed) (!!), so line breaks are still present if you need to try replacing them with ASCII 13 in your code ... (text fields (window shapes) don't display LF as a string)

0
source

According to MSDN :

Although XML processors retain all spaces in the content of elements, they often normalize it in attribute values. Tabs, carriages, and spaces are indicated as separate spaces. In certain types of attributes, they trim the white space that comes before or after the main body of the value and reduces the space within the value to separate spaces. (If DTD is available, this trimming will be performed for all attributes that are not CDATA type.)

For example, an XML document may contain the following:

 <whiteSpaceLoss note1="this is a note." note2="this is a note."> 

The XML parser reports both attribute values ​​as "this is a note." converting line breaks to single spaces.

I cannot find anything about preserving attribute spaces, but I think it may not be possible according to this explanation.

0
source

Source: https://habr.com/ru/post/920395/


All Articles