Replace any non-ascii character in string in java

How to convert -lrb-300-rrb- 922-6590 to -lrb-300-rrb- 922-6590 in java?

Have tried the following:

 t.lemma = lemma.replaceAll("\\p{C}", " "); t.lemma = lemma.replaceAll("[\u0000-\u001f]", " "); 

Perhaps something conceptual is missing. Understand any pointers to a solution.

thanks

+4
source share
2 answers

Try the following:

str = str.replaceAll("[^\\p{ASCII}]", " ");

By the way, \p{ASCII} is all ASCII: [\x00-\x7F] .

In the "ahother" field, you need to use the Pattern constant so as not to recompile the expression every time.

 private static final Pattern REGEX_PATTERN = Pattern.compile("[^\\p{ASCII}]"); public static void main(String[] args) { String input = "-lrb-300-rrb- 922-6590"; System.out.println( REGEX_PATTERN.matcher(input).replaceAll(" ") ); // prints "-lrb-300-rrb- 922-6590" } 

See also:

+10
source

Assuming you want to keep the characters a-zA-Z0-9 and punctuation, you can do:

 t.lemma = lemma.replaceAll("[^\\p{Punct}\\w]", " ")); 
0
source

Source: https://habr.com/ru/post/1500614/


All Articles