Unicode add-ons in JavaFX

I'm having problems with Unicode characters from extra ("astral") planes in JavaFX. In particular, I cannot insert such characters in the TextInputDialog (instead I get some strange characters, for example รฐ ), and I can not use them in the WebView (they appear as ).

The same characters work fine if I entered them through JOptionPane.showInputDialog and printed them on the console. They are even displayed in JavaFX Alert , although it adds some junk at the end.

Is there a way to fix these problems?

I am using Oracle JDK version 1.8.0_51 on Linux.
Examples of additional plane symbols: ๐Ÿ˜€ ๐‚ƒ ๐Ÿ‚ก ๐Ÿ™ญ ๐ซž‚
If you do not see them, you may need to install additional fonts, such as Symbola or Noto.

Here's an example program (using Label , not WebView ):

 import javax.swing.JOptionPane; import javafx.application.Application; import javafx.scene.Scene; import javafx.scene.control.Alert; import javafx.scene.control.Alert.AlertType; import javafx.scene.control.Label; import javafx.scene.control.TextInputDialog; import javafx.scene.layout.StackPane; import javafx.stage.Stage; public class UniTest extends Application { @Override public void start(final Stage stage) throws Exception { final String s = new String(new int[]{127137, 178050, 3232, 128512, 241}, 0, 5); System.out.println("The string: " + s); System.out.println("Characters: " + s.length()); System.out.println("Code points: " + s.codePoints().count()); JOptionPane.showMessageDialog(null, s, "JOptionPane", JOptionPane.INFORMATION_MESSAGE); final Alert al = new Alert(AlertType.INFORMATION); al.setTitle("Alert"); al.setContentText(s); al.showAndWait(); final TextInputDialog dlg = new TextInputDialog(); dlg.setTitle("TextInputDialog"); dlg.setContentText("Try to paste the string in here"); dlg.showAndWait().ifPresent(x -> System.out.println("Your input: " + x)); final StackPane root = new StackPane(); root.getChildren().add(new Label(s)); stage.setScene(new Scene(root, 400, 300)); stage.setTitle("Stage"); stage.show(); } public static void main(final String... args) { launch(args); } } 

And here are the results that I get:

screenshots

Note: not all symbols in the example belong to additional planes, and one of the symbols is displayed only in the console.

+5
source share
1 answer

TL DR: Obviously, JavaFX is not working.

Here is the text you are using.

 ๐Ÿ‚ก๐ซž‚เฒ ๐Ÿ˜€รฑ 

Decimal code representation:

 127137 178050 3232 128512 241 

Hex representation:

 0x1F0A1 0x2B782 0xCA0 0x1F600 0xF1 

Display error

Java uses UTF-16 internally. Therefore, consider the presentation of UTF-16:

UTF-16 Performance:

 D83C DCA1 D86D DF82 0CA0 D83D DE00 00F1 

We see that the display shows five characters that you expect, but then three characters of garbage.

Therefore, he is clearly trying to display 8 glyphs, where there are only five. This is almost certainly because the display code has 8 characters, because three characters are encoded in UTF-16 as surrogate pairs, so write down two 16-bit words each time. In other words, it uses the wrong value for string length in the presence of surrogate pairs.

Nested text error

UTF-8 Presentation of test data:

 F0 9F 82 A1 F0 AB 9E 82 E0 B2 A0 F0 9F 98 80 C3 B1 

What is seen

 00F0 รฐ LATIN SMALL LETTER ETH 009F ๎‚Ÿ <control> = APC = APPLICATION PROGRAM COMMAND 0082 ๎‚‚ <control> = BPH = BREAK PERMITTED HERE 00A1 ยก INVERTED EXCLAMATION MARK 00F0 รฐ LATIN SMALL LETTER ETH 

(Two control characters can have glyphs in some fonts containing either their abbreviations or hexadecimal codes. They are visible in your example.)

Latin representation of hex1:

 F0 9F 82 A1 F0 

Note that these five bytes match the first five bytes of the UTF-8 representation of the intended text.

Conclusion: the inserted data was inserted as 5 UTF-8 code points, occupying 17 bytes, but interpreted as 5 Latin1 codes, occupying 5 bytes. Again, the wrong property was used for the length.

+3
source

Source: https://habr.com/ru/post/1233633/


All Articles