String.sub problem with non-English characters

Question

String.sub problem with non-English characters

I need to get the first text variable char. I achieve this in one of the following simple ways:

string.sub(someText,1,1)

or

someText:sub(1,1)

If I do the following, I expect to receive 'ñ'as the first letter. However, the result of any of the methods subis'Ã'

local someText = 'ñññññññ'
print('Test whole: '..someText) 
print('first char: '..someText:sub(1,1))
print('first char with .sub: '..string.sub(someText,1,1))

Here are the results from the console:

2014-03-02 09:08:47.959 Corona Simulator[1701:507] Test whole: ñññññññ
2014-03-02 09:08:47.960 Corona Simulator[1701:507] first char: Ã
2014-03-02 09:08:47.960 Corona Simulator[1701:507] first char with .sub: Ã

It seems that the function string.sub()encodes the return value in UTF-8. Just for strikes, I tried using the function utf8_decode()provided by the Corona SDK. This failed. The simulator indicated that the function was expecting a number, but received it instead nil.

I also searched the Internet to find out if anyone else had this problem. I found out that there is a lot of discussion about Lua, Corona, Unicode and UTF-8, but I have not found anything that could solve this problem.

+4

string lua unicode utf-8 lua-patterns

C. Ulker 02 . '14 15:29

2

: , , , . :

ASCII- ( < 128 ASCII, ASCII )
- (, )
0-

, , , .

UTF-8: , UTF-8. Glyph CodePoint? AFAIK unicode . ?

0

Deduplicator 02 . '14 18:29

Yu Hao · Accepted Answer · 2014-03-02T15:41:31+0000

Lua 8-, , Lua . UTF-8 ñ , someText:sub(1,1) .

UTF-8 ASCII , ASCII, 128. , 194-244, 128-191.

- ".[\128-\191]*" (UTF-8) CodePoint ( Grapheme):

for c in "ñññññññ":gmatch(".[\128-\191]*") do -- pretend the first string is in NFC
    print(c)
end

:

ñ
ñ
ñ
ñ
ñ
ñ
ñ

String.sub problem with non-English characters

More articles: