How to replace a Latin Unicode character with [az]

I am trying to convert the whole Latin Unicode character to their representations [az]

 ó --> o í --> i 

I can easily do one after another, for example:

 myString = myString.replaceAll("ó","o"); 

but since there are many variations, this approach is simply impractical

Is there any other way to do this in Java? e.g. a regular Expression or utility library

USING:

1- city names from other languages ​​into English, for example.

Espiritu Santo → Espirito Santo,

+6
source share
2 answers

This answer requires Java 1.6 or higher by adding java.text.Normalizer .

  String normalized = Normalizer.normalize(input, Normalizer.Form.NFD); String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); 

Example:

 public class Main { public static void main(String[] args) { String input = "Árvíztűrő tükörfúrógép"; System.out.println("Input: " + input); String normalized = Normalizer.normalize(input, Normalizer.Form.NFD); System.out.println("Normalized: " + normalized); String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); System.out.println("Result: " + accentRemoved); } } 

Result:

 Input: Árvíztűrő tükörfúrógép Result: Arvizturo tukorfurogep 
+8
source

is there any similar module in python available?

0
source

Source: https://habr.com/ru/post/1232029/


All Articles