How to remove character accents in XSL?

I keep looking, but can't find the XSL function, which is the equivalent of "normalize-space" for characters. That is, my content accented UNICODE characters, which is great, but from this content I create a file name where I do not need these accents.

So, is there something that I am missing or looking incorrectly to handle characters easily?

In the XML data:

<filename>gri_gonéwiththèw00mitc</filename> 

In the XSLT stylesheet:

 <xsl:variable name="file"> <xsl:value-of select="filename"/> </xsl:variable> <xsl:value-of select="$file"/> 

leads to "gri_gonéwiththuw00mitc"

Where

 <xsl:value-of select='replace( normalize-unicode( "$file", "NFKD" ), "[^\\p{ASCII}]", "" )'/> 

gives nothing.

What I'm gri_gonewiththew00mitc is gri_gonewiththew00mitc (without accents)

Am I using the syntax incorrectly?

+4
source share
3 answers

In XSLT / XPath 1.0, if you want to replace these accented characters with a failed copy, you can use the translate() function.

But this assumes that your "accented UNICODE characters" are not composed of Unicode characters. If so, you will need to use the XPath 2.0 function normalize-unicode() .

And if the real goal is to have a valid URI, you should use encode-for-uri()

Update : Examples

 translate('gri_gonéwiththèw00mitc','áàâäéèêëíìîïóòôöúùûü','aaaaeeeeiiiioooouuuu') 

Result: gri_gonewiththew00mitc

 encode-for-uri('gri_gonéwiththèw00mitc') 

Result: gri_gon%C3%A9withth%C3%A8w00mitc

The correct expression is given by @biziclop's suggestion:

 replace(normalize-unicode('gri_gonéwiththèw00mitc','NFKD'),'\P{ASCII}','') 

Result: gri_gonewiththew00mitc

Note In XPath 2.0, the correct negation of a character class is \P

+6
source

So, contrary to my comment, you can try the following:

 replace( normalize-unicode( "öt hűtőházból kértünk színhúst", "NFKD" ), "[^\\p{ASCII}]", "" ) 

Although it should be warned that any characters that cannot be decomposed and are not basic ASCII (for example, Norwegian ø or Icelandic Þ ) will be completely removed from the string, but this probably suits your requirements.

+3
source

The previously proposed methods contain an unknown character class named "ASCII". In my experience, XPath 2.0 recognizes the BasicLatin class, which should work for the same purpose as ASCII.

 replace(normalize-unicode('Lliç d'Am Oükl Úkřeč', 'NFKD'), '\P{IsBasicLatin}', '') 
+1
source

Source: https://habr.com/ru/post/1344841/


All Articles