Replace substrings with lookup table with xslt

I have several lines containing a variant of hexadecimal strings (the source is framemaker, if you're interested). Therefore, the lines might look like

this is some suggestion with some hex code \ x27 s, and we need to fix it.

and it will need to be changed to

this is some suggestion with some hex code and we need this fix.

In fact, there may be several of them on the same line, so I'm looking for the best way to go through the text, capture all the hexadecimal codes (looking like \ x ##) and replace all these codes with the correct character. I created an xml list / lookup table containing all the characters as follows:

<xsl:param name="reflist"> <Code Value="\x27">'</Code> <Code Value="\x28">(</Code> <Code Value="\x29">)</Code> <Code Value="\x2a">*</Code> <Code Value="\x2b">+</Code> <!-- much more like these... --> </xsl:param> 

I have currently used the simple replace argument, but too many characters can be used to execute it.

What is the best way to do this?

+4
source share
2 answers

Use analyze-string , as in

 <xsl:template match="text()"> <xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i"> <xsl:matching-substring> <xsl:value-of select="$reflist/Code[@Value = .]"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> 

I also suggest using a key, for example.

 <xsl:param name="reflist" as="document-node()"> <xsl:document> <Root> <Code Value="\x27">'</Code> <Code Value="\x28">(</Code> <Code Value="\x29">)</Code> <Code Value="\x2a">*</Code> <Code Value="\x2b">+</Code> <!-- much more like these... --> </Root> </xsl:document> </xsl:param> <xsl:key name="code-by-value" match="Code" use="@Value"/> 

then the search can be improved to

 <xsl:template match="text/text()"> <xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i"> <xsl:matching-substring> <xsl:value-of select="key('code-by-value', ., $reflist)"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> 

I found some time to change the suggestions made to the working code, while logging in

 <root> <text>this is some sentence with some hex code\x27 s , and we need that \x28and this\x29 fixed.</text> </root> 

a complete style sheet

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"> <xsl:param name="reflist" as="document-node()"> <xsl:document> <Root> <Code Value="\x27">'</Code> <Code Value="\x28">(</Code> <Code Value="\x29">)</Code> <Code Value="\x2a">*</Code> <Code Value="\x2b">+</Code> <!-- much more like these... --> </Root> </xsl:document> </xsl:param> <xsl:key name="code-by-value" match="Code" use="@Value"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* , node()"/> </xsl:copy> </xsl:template> <xsl:template match="text/text()"> <xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i"> <xsl:matching-substring> <xsl:value-of select="key('code-by-value', ., $reflist)"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet> 

Saxon 9.4 converts the input as follows:

 <root> <text>this is some sentence with some hex code' s , and we need that (and this) fixed.</text> </root> 
+2
source

You can completely avoid the use of any "lookup table" - like this:

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my" exclude-result-prefixes="my xs"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="text()[matches(., '\\x(\d|[af])+')]"> <xsl:analyze-string select="." regex="\\x(\d|[af])+" > <xsl:matching-substring> <xsl:value-of select= "codepoints-to-string(my:hex2dec(substring(.,3), 0))"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> <xsl:function name="my:hex2dec" as="xs:integer"> <xsl:param name="pStr" as="xs:string"/> <xsl:param name="pAccum" as="xs:integer"/> <xsl:sequence select= "if(not($pStr)) then $pAccum else for $char in substring($pStr, 1, 1), $code in if($char ge '0' and $char le '9') then xs:integer($char) else string-to-codepoints($char) - string-to-codepoints('a') +10 return my:hex2dec(substring($pStr,2), 16*$pAccum + $code) "/> </xsl:function> </xsl:stylesheet> 

When this conversion is applied to the following XML document:

 <t> <p>this is some sentence with some hex code\x27 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x28 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x29 s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2a s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2b s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2c s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2d s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2e s , and we need that fixed.</p> <p>this is some sentence with some hex code\x2f s , and we need that fixed.</p> </t> 

required, the correct result is obtained :

 <t> <p>this is some sentence with some hex code' s , and we need that fixed.</p> <p>this is some sentence with some hex code( s , and we need that fixed.</p> <p>this is some sentence with some hex code) s , and we need that fixed.</p> <p>this is some sentence with some hex code* s , and we need that fixed.</p> <p>this is some sentence with some hex code+ s , and we need that fixed.</p> <p>this is some sentence with some hex code, s , and we need that fixed.</p> <p>this is some sentence with some hex code- s , and we need that fixed.</p> <p>this is some sentence with some hex code. s , and we need that fixed.</p> <p>this is some sentence with some hex code/ s , and we need that fixed.</p> </t> 

Please note :

This conversion is general and can correctly handle any Unicode hexadecimal code.

For example, if the same conversion applies to this XML document :

 <t> <p>this is some sentence with some hex code\x0428\x0438\x0448 s , and we need that fixed.</p> </t> 

the correct result is created (containing the Bulgarian word for "grill" in Cyrillic) :

 <t> <p>this is some sentence with some hex code s , and we need that fixed.</p> </t> 
+4
source

Source: https://habr.com/ru/post/1445073/


All Articles