Convert Unicoded text to readable text in Python

Question

Convert Unicoded text to readable text in Python

I have a Unicode text as follows

(S (NP (N \u0db6\u0dbd\u0dbd\u0dcf)) (VP (V \u0db6\u0dbb\u0dc0\u0dcf)))

How to change this to a readable format by converting the codes '\ u0 ___' to the corresponding readable characters. I am using python version 2.7

I got this conclusion by executing the following code segment in NLTK (3.0), where the tree is nltk.tree.Tree

for tree in treelist1:
    print unicode(str(tree))

I need something like print (TreePrettyPrinter (tree) .text ()) , where it gives the unicode-compatible output as I wanted, but with a tree layout that I don't need. Is there a way in NLTK to get readable text like output too?

The same problem has a way out

for rule in grammar1.productions():
    print(rule.unicode_repr())

where grammar1 nltk.grammar.CFG

The conclusion is as follows.

VP -> V
VP -> NP V
N -> '\u0db6\u0dbd\u0dca\u0dbd\u0dcf'
N -> '\u0db8\u0dd2\u0db1\u0dd2\u0dc3\u0dcf'
N -> '\u0db8\u0dda\u0dc3\u0dba'

.

+4

python-2.7 unicode nltk

Upekha Vandebona 28 . '15 20:03

1

Upekha Vandebona · Accepted Answer · 2015-09-28T21:16:59+0000

question. Python 2.7

NLTK. - 'unicode_escape'

print(str(tree).decode('unicode_escape'))

print(rule.unicode_repr().decode('unicode_escape'))

NTLK nltk.tree.Tree

print(tree.pformat())

Convert Unicoded text to readable text in Python

More articles: