Regular expressions for Japanese in Lua

I want to process the Japanese dictionary in Lua (for LuaTeX - specific). The dictionary is stored in a text file that must be read. When reading each line of a file, words must correspond to a regular expression (lines are written like this:) | がくせい | student |:

function readFile(fn)
   local file = assert(io.open(fn, "r"))
   local contents = file:read("*a")
   file:close()
   return contents
end

function processTest(contents)
   for line in contents:gmatch("%a+") do
      print(line)
   end
end

a = readFile("vocabulary.org")
processTest(a)

Now the problem is that only English words are printed:

student

I should mention that I am new to Lua and LuaTeX, so if there is a better approach to this, I would be happy to know.

Anyway, is it possible to get Japanese words?

+3
source share
2 answers

%a . ( , , ASCII Latin-1.)

UTF-8, , .

, Hiragana UTF-8 :

(\227\129[\129-\191])
(\227\130[\128-\160])

, Unicode ( ), .

+4

Lua, , , , . Lua Unicode "", . , , . , gmatch() , .

i18n . .

+1

Source: https://habr.com/ru/post/1786712/


All Articles