How to remove character accents in XSL?

Question

How to remove character accents in XSL?

I keep looking, but can't find the XSL function, which is the equivalent of "normalize-space" for characters. That is, my content accented UNICODE characters, which is great, but from this content I create a file name where I do not need these accents.

So, is there something that I am missing or looking incorrectly to handle characters easily?

In the XML data:

<filename>gri_gonéwiththèw00mitc</filename>

In the XSLT stylesheet:

 <xsl:variable name="file"> <xsl:value-of select="filename"/> </xsl:variable> <xsl:value-of select="$file"/>

leads to "gri_gonéwiththuw00mitc"

Where

 <xsl:value-of select='replace( normalize-unicode( "$file", "NFKD" ), "[^\\p{ASCII}]", "" )'/>

gives nothing.

What I'm gri_gonewiththew00mitc is gri_gonewiththew00mitc (without accents)

Am I using the syntax incorrectly?

+4

xml xslt unicode character-encoding

LOlliffe Mar 22 '11 at 21:35

source share

3 answers

So, contrary to my comment, you can try the following:

 replace( normalize-unicode( "öt hűtőházból kértünk színhúst", "NFKD" ), "[^\\p{ASCII}]", "" )

Although it should be warned that any characters that cannot be decomposed and are not basic ASCII (for example, Norwegian ø or Icelandic Þ ) will be completely removed from the string, but this probably suits your requirements.

+3

biziclop Mar 22 '11 at 10:55

source share

The previously proposed methods contain an unknown character class named "ASCII". In my experience, XPath 2.0 recognizes the BasicLatin class, which should work for the same purpose as ASCII.

 replace(normalize-unicode('Lliç d'Am Oükl Úkřeč', 'NFKD'), '\P{IsBasicLatin}', '')

+1

Yuri Feb 25 '15 at 14:14

source share

user357812 · Accepted Answer · 2011-03-22T21:52:44+0000

In XSLT / XPath 1.0, if you want to replace these accented characters with a failed copy, you can use the translate() function.

But this assumes that your "accented UNICODE characters" are not composed of Unicode characters. If so, you will need to use the XPath 2.0 function normalize-unicode() .

And if the real goal is to have a valid URI, you should use encode-for-uri()

Update : Examples

 translate('gri_gonéwiththèw00mitc','áàâäéèêëíìîïóòôöúùûü','aaaaeeeeiiiioooouuuu')

Result: gri_gonewiththew00mitc

 encode-for-uri('gri_gonéwiththèw00mitc')

Result: gri_gon%C3%A9withth%C3%A8w00mitc

The correct expression is given by @biziclop's suggestion:

 replace(normalize-unicode('gri_gonéwiththèw00mitc','NFKD'),'\P{ASCII}','')

Result: gri_gonewiththew00mitc

Note In XPath 2.0, the correct negation of a character class is \P

How to remove character accents in XSL?

More articles: