Keep and other special characters in XSLT output using patterns

I am using XSLT to extract some HTML content with special characters (e.g. &nbsp; ) from an XML file. Content is stored in <content> nodes. I defined most special characters as follows: <!ENTITY nbsp "&#160;"> , so this expression works fine:

 <xsl:copy-of select="content" disable-output-escaping="yes"/> 

Now I want to add target="_blank" to every link found in this content. This is the solution I came across:

 <xsl:template match="a" mode="html"> <a> <xsl:attribute name="href"><xsl:value-of select="@*"/></xsl:attribute> <xsl:attribute name="target">_blank</xsl:attribute> <xsl:apply-templates select="text()|* "/> </a> </xsl:template> 

And instead of the copy-of element, I use this:

 <xsl:apply-templates select="content" mode="html"/> 

Now all these special characters (and nbsp too) have disappeared from the output. How to save them? It seems that disable-output-escaping="yes" doesn't help here.

Ok, I'm using the XSLTProcessor class in PHP. The disable-output-escaping attribute did not actually give an error, but when I deleted it, the result was the same with all nbsp, so it didn't matter.


UPD With the XSL template that I showed earlier, my input example:

 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE page SYSTEM "html-entities.xsl"> <content>There is a&nbsp;non-breaking <a href="http://localhost">space</a> inside.</content> 

HTML-entities.xsl:

 <?xml version="1.0" encoding="UTF-8"?> <!ENTITY nbsp "&#160;"> 

PHP code:

 $xp = new XSLTProcessor(); $xsl = new DOMDocument(); $xsl->load($xsl_filename); $xp->importStylesheet($xsl); $xml_doc = new DOMDocument(); $xml_doc->resolveExternals = true; $xml_doc->load($xml_filename); $html = $xp->transformToXML($xml_doc); 

My current output is:

There is anon-breaking <a href="http://localhost" target="_blank">space</a> inside.

My desired result:

There is a&nbsp;non-breaking <a href="http://localhost" target="_blank">space</a> inside.

+4
source share
1 answer

Basically, does the source code of the input XML document have a symbol reference, for example &#160; , or a link to an object like &nbsp; , or such a character literally does not matter for XSLT and does not matter how the input is processed and how the output looks; mostly XSLT runs on a tree with Unicode characters stored in text nodes. At least that's a theory, your PHP code seems to be working with a DOM tree model that can store object reference nodes, but even then it doesn't matter for XSLT. There should be text nodes in the input tree containing Unicode characters (one if it could be an inextricable space character in Unicode 160), and if you copy such text to the result, there will be node text with the same Unicode in the resulting tree.

For the html output method, some XSLT processors (e.g. Saxon 6.5.5) can do you a favor to ensure that characters defined as entities in HTML are serialized with an appropriate entity reference, but even if they do not serialize the resulting tree must be a file with the corresponding Unicode characters encoded as indicated by the encoding attribute of the xsl:output element.

Your current result, which completely removes the character (e.g. There is anon-breaking ), does not make sense to me.

0
source

Source: https://habr.com/ru/post/1436102/


All Articles