XSLT to remove non-ASCII

I need to modify an XML document using XSLT. I would like to replace all non-ASCII characters with a space.

Input Example:

<input>azerty12€_étè</input>

Only these characters are allowed:

!"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Expected Result:

 <input>azerty12 _ t </input>
+4
source share
2 answers

Assuming you're limited to XSLT 1.0, you could try:

<xsl:variable name="ascii">!"#$%&amp;'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~</xsl:variable>
<xsl:variable name="spaces" select="'                                                                                             '" />

<xsl:template match="input">
    <xsl:copy>
        <xsl:value-of select="translate(., translate(., $ascii, ''), $spaces)"/>
    </xsl:copy>
</xsl:template>

This is a bit hacked: it will work as long as $spacesthere are enough spaces in the variable to accommodate all the non-ascii characters found in the input.

If you do not want to rely on such an assumption, you will have to use a recursive template to replace them one by one:

<xsl:template match="input">
    <xsl:copy>
        <xsl:call-template name="replace-non-ascii">
            <xsl:with-param name="text" select="."/>
        </xsl:call-template>
    </xsl:copy>
</xsl:template>

<xsl:template name="replace-non-ascii">
    <xsl:param name="text"/>
    <xsl:variable name="ascii"> !"#$%&amp;'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~</xsl:variable>
    <xsl:variable name="non-ascii" select="translate($text, $ascii, '')" />
    <xsl:choose>
        <xsl:when test="$non-ascii">
            <xsl:variable name="char" select="substring($non-ascii, 1, 1)" />
            <!-- recursive call -->
            <xsl:call-template name="replace-non-ascii">
                <xsl:with-param name="text" select="translate($text, $char, ' ')"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>
+5
source

XSLT 2.0 Solution

This XML Input

<input>azerty12€_étè</input>

assigned to this XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:template match="input">
    <xsl:copy>
      <xsl:value-of select="replace(., '\P{IsBasicLatin}', ' ')"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XML

<input>azerty12 _ t </input>

.

+5

Source: https://habr.com/ru/post/1625394/


All Articles