Which system component is responsible for binding Unicode ligatures to a Java application?

This is a "meta-question" that I came across while trying to find the best specification for another question of mine ( Rendering of Devanagari (Unicode) ligatures in Java Swing JComponent on Mac OS X ).

What I do not quite understand yet is that the “component” (due to a better word) of this system is responsible for displaying Unicode text in Java and, more specifically, ligatures.

As far as I understand, the following components affect the process:

  • System character encoding (for example, UTF-8 on Mac OS X 10.6, UTF-16 on Windows 7 (according to akira's comment on this post superuser.com )).
  • Java Charset (the default is MacRoman on Mac OS X 10.6, cp1252 on Windows 7).
  • The font that is used to render the text, and font encoding information (as suggested by Donal Fellows on my other question :

    "include information about which encoding they use."

  • Obviously, there are characters to render at the corresponding Unicode code points.

So, if the Unicode character string is displayed incorrectly (as seen from my other question, sa), where is the problem most likely to be? I., which "component" (what would be the best word?) Is responsible for the "binding" of the ligature, its composition?

Thank you very much in advance and please let me know if you need more information.

+6
source share
4 answers

This system component is called a font renderer or a font rasterizer. He is responsible for converting the sequence of character codes into pixels based on the glyphs defined by the font. As other answers pointed out, the different character encoding values ​​that you can get and set with Java don't matter. When the JVM provides the font visualizer with a sequence of character codes, it reports that the encoding is used (probably UTF16, but this is transparent to the Java programmer). The font renderer uses the font encoding specified in the font file to match the corresponding glyphs.

Current versions of Windows and Mac OS X come with excellent font renderers.

The first point of confusion is that the JRE comes with its own font renderer, as part of the Java2D platform, and this is what Swing uses. It should be possible to control whether Java uses its own renderer or system one.

EDIT:. Like McDowell mentioned in a comment, in OS X you can enable system rendering by setting the Java property apple.awt.graphics.UseQuartz = TRUE.

The second point of confusion is that ligatures are optional in English. A desktop publishing application replaces the league “ffl” (one character in a font) when it sees a word like “shuffle”, but most other applications do not bother. Based on what you said about Devanagari (and what I just read on Wikipedia), I collect ligatures that are not optional in this language.

By default, the Java2D font renderer does not perform ligatures. However, the JavaDoc for java.awt.font.TextAttribute.LIGATURES says that ligatures are always allowed for writing systems that require them. If this is not your experience, you may have found a bug in the Java2D font renderer. In the meantime, try using the Font constructor, which takes a font attribute map, including TextAttribute.LIGATURES.

+3
source

I am not an expert, but hopefully these tips will point you in the right direction ...

The encoding of the source data has little effect on the visualization of fonts. All character data in Java is UTF-16, so if you correctly transcode the information from the source into characters / lines, the integrity of the data must be preserved.

However, note:

  • AWT system can use default system encoding to display fonts
  • This is hardly applicable to Devanagari (I don't know about the legacy encoding that supports it)

AWT displays fonts through a fontconfig file . On my Windows system, this displays the Mangal font:

 allfonts.devanagari=Mangal 

Mac OS no doubt uses a different font.

Primary text rendering was introduced sometime during the life of Java 6 - I don’t know if this has anything to do with font support or just affects the rendering speed / smoothing / etc.

+3
source

If you refer to visual rendering strictly , then "coding" and related topics are no longer relevant: rendering goes from String to visual display. String has a specific (and immutable) encoding, which is UTF-16. Therefore, all issues, such as “Have I read this binary stream with the correct encoding”, must be resolved first .

The actual visualization of the text should be performed by the graphics subsystem. This will be AWT / Swing for “normal” Java or SWT or any other alternative system.

The first step (which is not strictly part of the "rendering") is to convert some binary data to String . This may include iff platform default encoding, the code does not explicitly indicate some encoding. This is the step at which encodings generally enter. After that, we are in a happy-clean-Unicode-land.

+2
source

Like Joachim said, what is the source of the data? If you are reading from a file or stream, I definitely will not trust the default system encoding. You must explicitly specify the encoding when reading data, for example.

 BufferedReader br = new BufferedReader( new InputStreamReader( file, "UTF-8" ) ); 

Or no matter what encoding your stream is in.

Cm:

http://download.oracle.com/javase/1.4.2/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream,%20java.lang.String )

+1
source

Source: https://habr.com/ru/post/888329/


All Articles