Remove all special characters from the string, not including non-latin characters

I want to remove all special characters from the string except numbers and normal az characters.

I do it like this:

text = text.replaceAll("[^a-zA-Z0-9 ]+", "");

The problem with this method is that it will also delete all non-Latin characters, such as è, é, ê, ë and many others.

Non-specific characters (the ones I want to keep) I mean all numbers and all alphabetic characters for all languages, or at least as many as possible.

How to remove only special characters?

+4
source share
2 answers

You can try \p{L}for all letters and \p{N}for all numbers:

text = text.replaceAll("[^\\p{L}\\p{N} ]+", "");
+3
source

, regex, guava :

CharMatcher.JAVA_LETTER_OR_DIGIT.retainFrom("èêAAAGRt123")
+1

Source: https://habr.com/ru/post/1536857/


All Articles