Convert Unicoded text to readable text in Python

I have a Unicode text as follows

(S (NP (N \u0db6\u0dbd\u0dbd\u0dcf)) (VP (V \u0db6\u0dbb\u0dc0\u0dcf)))

How to change this to a readable format by converting the codes '\ u0 ___' to the corresponding readable characters. I am using python version 2.7

I got this conclusion by executing the following code segment in NLTK (3.0), where the tree is nltk.tree.Tree

for tree in treelist1:
    print unicode(str(tree))

I need something like print (TreePrettyPrinter (tree) .text ()) , where it gives the unicode-compatible output as I wanted, but with a tree layout that I don't need. Is there a way in NLTK to get readable text like output too?


The same problem has a way out

for rule in grammar1.productions():
    print(rule.unicode_repr())

where grammar1 nltk.grammar.CFG

The conclusion is as follows.

VP -> V
VP -> NP V
N -> '\u0db6\u0dbd\u0dca\u0dbd\u0dcf'
N -> '\u0db8\u0dd2\u0db1\u0dd2\u0dc3\u0dcf'
N -> '\u0db8\u0dda\u0dc3\u0dba'

.

+4
1

question. Python 2.7

NLTK. - 'unicode_escape'

print(str(tree).decode('unicode_escape'))

print(rule.unicode_repr().decode('unicode_escape'))

NTLK nltk.tree.Tree

print(tree.pformat())
+3

Source: https://habr.com/ru/post/1609363/


All Articles