You can completely avoid the use of any "lookup table" - like this:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my" exclude-result-prefixes="my xs"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="text()[matches(., '\\x(\d|[af])+')]"> <xsl:analyze-string select="." regex="\\x(\d|[af])+" > <xsl:matching-substring> <xsl:value-of select= "codepoints-to-string(my:hex2dec(substring(.,3), 0))"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> <xsl:function name="my:hex2dec" as="xs:integer"> <xsl:param name="pStr" as="xs:string"/> <xsl:param name="pAccum" as="xs:integer"/> <xsl:sequence select= "if(not($pStr)) then $pAccum else for $char in substring($pStr, 1, 1), $code in if($char ge '0' and $char le '9') then xs:integer($char) else string-to-codepoints($char) - string-to-codepoints('a') +10 return my:hex2dec(substring($pStr,2), 16*$pAccum + $code) "/> </xsl:function> </xsl:stylesheet>
When this conversion is applied to the following XML document:
<t> <p>this is some sentence with some hex code\x27 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x28 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x29 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2a s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2b s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2c s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2d s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2e s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2f s , and we need that fixed.</p> </t>
required, the correct result is obtained :
<t> <p>this is some sentence with some hex code' s , and we need that fixed.</p> <p>this is some sentence with some hex code( s , and we need that fixed.</p> <p>this is some sentence with some hex code) s , and we need that fixed.</p> <p>this is some sentence with some hex code* s , and we need that fixed.</p> <p>this is some sentence with some hex code+ s , and we need that fixed.</p> <p>this is some sentence with some hex code, s , and we need that fixed.</p> <p>this is some sentence with some hex code- s , and we need that fixed.</p> <p>this is some sentence with some hex code. s , and we need that fixed.</p> <p>this is some sentence with some hex code/ s , and we need that fixed.</p> </t>
Please note :
This conversion is general and can correctly handle any Unicode hexadecimal code.
For example, if the same conversion applies to this XML document :
<t> <p>this is some sentence with some hex code\x0428\x0438\x0448 s , and we need that fixed.</p> </t>
the correct result is created (containing the Bulgarian word for "grill" in Cyrillic) :
<t> <p>this is some sentence with some hex code s , and we need that fixed.</p> </t>