Is there a module for translating a Chinese character into Japanese (kanji) or Korean (hanja) in Python 3?

I would like to switch CJK characters in Python 3.3. That is, I have to get 價 (Korean) from 价 (Chinese) and 価 (Japanese) from 價. Is there an external module?

+4
source share
2 answers

Unihan Information

The i Unihan 價 page provides a simplified version (compared to the traditional one), but does not seem to provide Japanese / Korean. So that...

Cjklib

I would recommend looking at CJKlib , which has a Variants function section:

Z-variant forms that differ only in font

[update] Z-option

Your character pattern 價 (U + 50F9) does not have a z-variant. However, 価 (U + 4FA1) has kZVariant up to U + 50F9 價. That seems weird.

Further reading

+2
source

Here is a relatively complete conversion table. You can reset it to json for later use:

import requests from bs4 import BeautifulSoup as BS import json def gen(soup): for tr in soup.select('tr'): tds = tr.select('td.tdR4') if len(tds) == 6: yield tds[2].string, tds[3].string uri = 'http://www.kishugiken.co.jp/cn/code10d.html' soup = BS(requests.get(uri).content, 'html5lib') d = {} for hanzi, kanji in gen(soup): a = d.get(hanzi, []) a.append(kanji) d[hanzi] = a print(json.dumps(d, indent=4)) 

Code and its output are in this value .

+1
source

Source: https://habr.com/ru/post/1479305/


All Articles