Korean Character Display - iOS App

Question

Korean Character Display - iOS App

I am trying to display Korean text in my iPhone application. The application adds Unicode letters one by one to NSMutableString and displays a line on the screen after adding each letter.

I understand that there are some rules for connecting letters (Jamo).

Is there a function for automatically applying all of these rules to a string of letters, or do I need to write code to make changes (for example, changing a consonant to a tail consonant if there is a vowel in front of it)?

+4

ios objective-c unicode internationalization cjk

FCA Oct 22 '12 at 5:52

source share

3 answers

Check out these system-level text input tools. I never used them, but I looked promising.

Since iOS does not support system keyboard settings, everyone uses only the default input tool. And handling the composition of a hangul is different for all operating systems or platforms. (MS / Apple / Samsung / LG or others). So the best way is to use a system like UITextField to ensure consistency for users. Or you have to accurately simulate how your platform does it. Of course, you can do it yourself, but users will not like it.

Although I am not an expert on this topic - the Korean Hangul linker, but I do not think that there is a simple algorithm without searching the table. In any case, if you really want to implement it yourself, these are all the main problems that you have to deal with.

Composing your visual characters into consonants and vowels defined in Unicode.
Definition of initial consonants / final consonants by placing vowels.

It would not be so difficult, but in any case, you need to change the previous sequence of characters. You cannot implement Korean input with only one-way flow unless you have a separate key for the start / end consonants that look the same.

Unicode defines all valid sets of Jamo components. Usually these components are too many to be presented on the device. And also ineffective. Most Korean input systems decompose these Jamo again and compose them once before building the final junk. You can also identify and decompose them visually, as Korean people do.

After you receive the initial / final consonants and vowels that are defined in the Unicode standard, the Unicode normalization function (for example, -[NSString precomposedStringWithCompatibilityMapping] ) will perform the rest of the tasks.

+2

Eonil Oct 22 '12 at 19:40

source share

libhangul (code.google.com/p/libhangul) does the conversion! It has several functions for processing various types of keyboards (for example, keyboards with different layouts) and converting keys to Unicodes Hanguls. It also has several functions that combine Hanguls to create syllables (they mainly implement the tables that Enoil mentioned in his answer).

Libhangul stores Hanguls in its buffer as it is received (it does not output them). Having received enough Hanguls and successfully converting them into a syllable, it outputs the syllable. Unfortunately, this is rather confusing for the user. This path displays the contents of the buffer on the screen. After receiving a new Hangul, what was shown should be deleted. If the syllable is successfully formed, the syllable is displayed. Otherwise, the contents of the buffer are displayed again. Please note that you cannot just display the new Hangul on the screen. You must erase what you showed earlier and read the previous Hanguls and the new one from the buffer and display them again on the screen. The reason is that Liebangul can change the code for previous hangul stored in the buffer so that it can be combined with the new hangul. This way you get updated Hanguls.

Also note that if the user changes the location of the cursor, the buffer must be empty. In addition, if the user presses backspace, then the last Hangul displayed on the screen must be deleted and must be removed from the buffer. Libhangul also has some typo correction features. For example, if you type ᅡ and ᄉ, it converts them to 사.

Thanks to JongAm Park and Eonil for your help and thoughtful comments! Since at the moment my reputation is less than 15, I can not answer your answers, but I will do it when I can.

+2

FCA Nov 23 '12 at 3:01

source share

Jongam park · Accepted Answer · 2012-10-23T16:20:56+0000

FCA. It is you who sent me a letter, right? Since a more detailed question is here, I will try (as best as possible) to answer here instead of responding to your email address.

Reading all the text that you and the people here wrote, I realized that you are creating Korean handwriting recognition software. Thus, you will not like the luxury of the Korean input method provided by Apple.

There are two things that I can say. Release one by one. (I believe that you already know about one of two things that I will tell.)

How to write a text in a hangul.
So, after reading your request, it should not be about the Unicode encoding / decomposed Korean line (or just the Ja (consonants) and Mo (vowels) series). The question is how to determine if a consonant is (your term is a consonant tail, right?), Which user writes is the last consonant or initial consonant of the next syllable. Learning Korean is best, but let me explain it briefly.

Say you write 소방차 (fire brigade car). You should write: ㅅ ㅗㅂ ㅏㅇ ㅊㅏ (Again, I’m not talking about the Unicode decomposed form, but about how people write Korean text.)

When you type ㅗ (which is the second char), the mapping system displays 소, appending ㅗ to the previous ㅅ. And he will look for a Korean table. (Although how to assemble Hangul, this is the JoHap (조합형) style, which is called the composite style, there are tables of allowed Korean text defined in any Korean standard called Wansung style (완성형). So, you should test the "assembled" syllable on the table, to see if there is such a syllable). Then you will find "소" in the table. So you will see "소".

Now the next char, "ㅂ", is written. Then here it gets a little trickier. Since the syllable "솝" is in the table, it first appends ㅂ to the previous syllable. Thus, it will display "솝". However, it is still not fully defined. The user writes the following char, "ㅏ". He is sure that there is no syllable without the first / initial consonant (Ja). He will search for a table, but will not be able to find the syllable "ㅏ".

So, he guesses that ㅂ (edited from ㅅ. It was a typo) attached to the previous syllable actually refers to the 2nd syllable. And it should display "소바". Now he is typing .. Then he tries to tie ㅇ to the second syllable. So it displays 소방. (At this point, he can also search for 방 in the table. And he is found.)

Now "ㅊ" is dialed. He can probably check 소방 소방 inside, where o and ㅊ exist under 바 (I cannot write it because there is no such syllable with o and ㅊ exist together under 바, for example 밝.). However, there is no such syllable. Thus, he instantly determines that ㅊ refers to the next syllable.

Then "ㅏ" is dialed. He will collect ㅊ and ㅏ to make 차. When you press the spacebar or the return key or any other spacebar, it completes the compilation of Hangul.

This is a simple case. Korean has more complex syllables, such as 빨, 꼭, 헗, etc. For the first consonants 복자음 (BokJaUm, Double Consonants), such as ㅃ, ㄲ in 빨 and 꼭, people type ㅂ and ㅅ by pressing the shift key. Then ㅃ and ㄲ are displayed. Thus, choosing how to consonants and determining where (the previous syllable or the next syllable) that it belongs to can be easy if the user enters the keyboard. (However, there are some good Korean input methods for Windows and Xterm where it allows you to type ㅂ twice to do ㅃ. This is kind of an intelligent function. But test text like 빱빠 라빱, 을 을 can be tricky because you finish testing 3 or 4 consonants are grouped as {1,3}, {2,2}, {3, 1}.

The bad news is that ... because you are writing handwriting recognition, you may need to handle such a tricky case if you enter recognized Hangul characters one by one into the Korean input mechanism. However, if you write your own input method in your application, you can save your own state machine, so that might be easier. But, as you can see, this is a compromise. Depending on the existing input mechanism and the use of each char in it. (Hmm ... wait ... Maybe the input mechanism can also handle these complex cases.)

FYI, I would like to introduce two open source projects. One of them is the Korean Finder input method for Mac , and the second is a data input mechanism with which you can make a Korean input method. In addition, there is a Korean input method for X-Windows, hosted here . If you prefer the Windows project to look, here is one .

The last two were hosted on KLDP.net, an open source Korean site, but they were moved to Google code. As far as I remember, SaeNaRu and Nabis (butterfly) can support typing the same consonant twice to make a double consonant.

For more information, you can find libhangul and nabi. (I remember that part of the code input method was almost the same between libhangul and nabi before, but at that time they were separate from each other and were expected to evolve independently. Therefore, I think they are different.

OK The first thing to do.

Now let's move on to the second problem. (This is the part that I said that you already know about it. But just to complete my explanation, let me explain it as well.)

It is about which character to choose as input for your likely Korean input mechanism, or a machine like libhangul. There are basically two representations of composed (on the display) symbols of the Hangul: Composed and Unfolded. A composed one contains fully composed symbols. For example, 사랑 합니다, each syllable, 사, 랑, 합, 니, 다 is stored as such. They are not saved as ㅅ, ㅏ, ㄹ, ㅏ, ㅇ, ㅎ, ㅏ, ㅂ, ㄴ, ㅣ, ㄷ, ㅏ. This is a composite view in Unicode. This view is commonly used by text editors, etc. Another view breaks down into Unicode. It's like ㅅ, ㅏ, ㄹ, ㅏ, ㅇ, ㅎ, ㅏ, ㅂ, ㄴ, ㅣ, ㄷ, ㅏ.

This view is commonly used by file systems. For example, if you put the file name in Hangul on Windows and access the folder containing it from the Mac, it will appear as ㅅㅏㄹ ㅏㅇ ㅎㅏ ㅂㄴ ㅣ ㄷㅏ, although it appears as 사랑 합니다 on Windows.

However, there is another set of characters if the memory serves, which is just a list of consonants and vowels of the Hangul. Although they may look the same or similar to the laid out syllables, they actually differ in that the place where they are drawn is in the middle of the space where the symbol is depicted. Its purpose is to present Hangul characters in tables of the Korean alphabet or similar things for educational purposes (or any other purpose).

So, I'm not sure which characters (i.e., decomposed or characters for the list of consonants and vowels of the Hangul) get into the input state machine or the input mechanism that you select or implement. If you implement it, this is your choice, but if you use some external libraries for the engine, you need to understand this.

In addition, as I mentioned in my blog post, in every unified and decomposed view, there are two options that are all defined in the Unicode standard. So, well .. yes .. I agree. This is quite a bit of work.

Like me, I tried to make an input method for Mac (when Apple announced that they would get rid of the Finder plugin architecture for security), but at that time libhangul (yes ... I tried to use it) changed a lot. Therefore, until it stabilized, I decided to hold on. But due to the fact that I became very busy with work and tired when I returned home, I could not succeed in my input method. So, I believe that the state of the libhangul project is much better than ever. So, at least try to look at it.

Also, if you don't have Windows, it would be nice to try hanterm or any xterm derivatives that support Hangul login on their own. Source code will be available on their hosting website.

Good luck with your project, and if you still have questions, ask me, please do so.

Korean Character Display - iOS App

More articles: