I have a Unicode text as follows
(S (NP (N \u0db6\u0dbd\u0dbd\u0dcf)) (VP (V \u0db6\u0dbb\u0dc0\u0dcf)))
How to change this to a readable format by converting the codes '\ u0 ___' to the corresponding readable characters. I am using python version 2.7
I got this conclusion by executing the following code segment in NLTK (3.0), where the tree is nltk.tree.Tree
for tree in treelist1:
print unicode(str(tree))
I need something like print (TreePrettyPrinter (tree) .text ()) , where it gives the unicode-compatible output as I wanted, but with a tree layout that I don't need. Is there a way in NLTK to get readable text like output too?
The same problem has a way out
for rule in grammar1.productions():
print(rule.unicode_repr())
where grammar1 nltk.grammar.CFG
The conclusion is as follows.
VP -> V
VP -> NP V
N -> '\u0db6\u0dbd\u0dca\u0dbd\u0dcf'
N -> '\u0db8\u0dd2\u0db1\u0dd2\u0dc3\u0dcf'
N -> '\u0db8\u0dda\u0dc3\u0dba'
.