How to handle the nbsp HTML element in XSLT. Without changing the input file

I am trying to convert an HTML file to an XML file using XSLT (using Oxygen 9.0 for conversion).

When I set up and run the XSLT conversion with an HTML file, then Oxygen outputs

The entity 'nbsp' was referenced,but not declared .

My html input file:

 <div><span>&nbsp;some text</span></div> 

Note. I want to know how to process this object using XSLT only, I do not want to make any changes to the input file.

+4
source share
2 answers

You can use XML objects to create an XML file that defines an nbsp object and includes an (broken) XML fragment.

For example, suppose your fragment is saved as a file with the name: "invalid.xml"

 <div><span>&nbsp;some text</span></div> 

Create the XML file as follows:

 <!DOCTYPE wrapper [ <!ENTITY nbsp "&#160;"> <!ENTITY invalid-xml-document SYSTEM "./invalid.xml"> ]><wrapper> &invalid-xml-document;</wrapper> 

When it is parsed, it will define the nbsp object, include the contents from "invalid.xml", and resolve the nbsp object properly. The result is the following:

 <wrapper> <div> <span> some text</span> </div> </wrapper> 

Then simply configure XSLT to place the new document element (in this example, the <wrapper> element).

+7
source

As far as I know, you will need to make changes to the input file.

Or by changing &nbsp; on &#160; , or by declaring a custom doctype that will do the conversion for you:

 <!DOCTYPE doctypeName [ <!ENTITY nbsp "&#160;"> ]> 

This is because &nbsp; is not one of the predefined XML entities.

+7
source

Source: https://habr.com/ru/post/1394585/


All Articles