How to translate strings using Java?

I need a translation program that allows me to translate any character to any other character or character set efficiently. The obvious way, apparently, is to use the character value from the input string as an index into a translation array with 256 elements.

For the source array, where each entry is set to its value, for example. hex-37 'appears in the 56th entry (allowing 00 to be the first), the user can then replace any characters needed in the translation string.

eg.1 I want to draw a string with the letter "A" for the alphabetic characters "N" for numeric characters, "B" for spaces and "X" for anything else. Thus, “SL5 3QW” becomes “AANBNAA”.

eg2. I want to translate some characters, such as "œ" (x'9D ') to "oe" (x'6F65'), "ß" to "ss", "å" to "a", etc.

How to get a numeric value from a character in an input line to use it as an index in a translate array?

It is easy with the CODE function in Excel and just in IBM assembler, but I can not trace the method in Java.

+3
source share
5 answers

It's a little off topic, but if you want to do a comprehensive character translation job, you can't just use it String.charAt(int). Unicode code pages greater than 65535 are represented in Java strings as two consecutive values char.

String.codepointAt(int) String.offsetByCodePoints(int, int) .

+5

Unicode 107000 . 256 .

, , String.codepointAt(int index).

Character.isWhitespace(int codepoint) Character.isDigit(int codepoint) ..

. http://download.oracle.com/javase/6/docs/api/java/lang/String.html http://download.oracle.com/javase/6/docs/api/java/lang/Character.html

+3

HashMap<String, String> . .

+2

, Unicode 256- .

- HashMap<Character,String> String.charAt() . Character, isDigit() isLetter(), ; , "" (, ).

HashMap, , . , (hashmap null), , .

+1

. - , , :


1:

, "A" , "N" , "B" "X" - . , "SL5 3QW" "AANBNAA".

:

public static String map(final String input){
    final char[] out = new char[input.length()];
    for(int i = 0; i < input.length(); i++){
        final char c = input.charAt(i);
        final char t;
        if(Character.isDigit(c)){
            t = 'N';
        } else if(Character.isWhitespace(c)){
            t = 'B';
        } else if(Character.isLetter(c)){
            t = 'A';
        } else{
            t = 'X';
        }
        out[i] = t;
    }
    return new String(out);
}

:

public static void main(final String[] args){
    System.out.println(map("SL5 3QW"));
}

:

AANBNAA


2:

e.g.2. , "œ" (x'9D '), "oe" (x'6F65 '), "ß" - "ss", "å" - "a" ..

:

, API Normalizer. . .


, , , , . , downvotes if/else. , / :

public interface CharTransformer{
    boolean supports(char input);
    char transform(char input);
}

, . , , . , . Transformer , .

public static String mapWithTransformers(final String input,
    final Collection<? extends CharTransformer> transformers){
    final char[] out = new char[input.length()];
    for(int i = 0; i < input.length(); i++){
        final char c = input.charAt(i);
        char t = 0;
        boolean matched = false;
        for(final CharTransformer tr : transformers){
            if(tr.supports(c)){
                matched = true;
                t = tr.transform(c);
                break;
            }
        }
        if(!matched){
            throw new IllegalArgumentException("Found no Transformer for char: "
                + c);
        }
        out[i] = t;
    }
    return new String(out);
}

:

Note. Others suggested using a map. Although I don’t think a standard map is good for this task, you can use the Guava MapMaker.makeComputingMap (function) to calculate the substitutions as (and automatically cache them). This way you have a lazily initialized caching map.

+1
source

Source: https://habr.com/ru/post/1767887/


All Articles