Reading arabic language from JSON file

I want to read a JSON file in Python containing Arabic text, but Arabic text looks like this:

ط§ظ„ط³ظژط¹ظژط§ط¯ظژط©ظگ ظ„ظژظٹظگط³ظژطھظŒ ط§ظ„طظژطµظŒظˆظژظ„ظژ ط¹ظژظ„ظ‰ظژ ظ…ط§ظژ ظ„ط§ظ†ظژظ…ظ„ظگظƒظژ ط¨ظژظ„ ظ‡ظگظٹظژ ط£ظ†ظژ ظ†ظژظپظ‡ظŒظ…ظژ ظˆظژظ†ظگط¯ط±ظژظƒظژ ظ‚ظژظٹظگظ…ط©ظڈ ظ…ظژط§ظ†ظژظ…ظ„ظƒ 

How can I read the correct Arabic letters?

 import sys non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd) print(x.translate(non_bmp_map)) 

x is a parameter containing the Arabic value from the JSON file. I expected to receive this offer: السعادة ليست الحصول على ما لانملك بل هي أن نفهم وندرك قيمة مانملك, but I get ط§ظ "ط³ظژط¹ظژططططظژط ظگ ظ طظژط ط طظ ظط طظ ‰ ظژ ظ ... ط§ظژ ظ "ط§ظ † ظژظ ... ظ" ظگظ ƒ ظژ ط ط ظژظ ظ ظ ظگظٹظژ ط ظ ظ ظ ظ ظ ظژظپظ ظژظپظ ظژظپظ Œ Œ ظ ظ ظ ظ ظ ظ ظ ط ط ط ط ط ط ... ط © ظڈ ظ ... ظژط§ظ † ظژظ ... ظ "ظ ƒ

+5
source share
1 answer

You did not mention whether you are using Python 3 or 2. In Python 3, strings are unicode by default.

If you are using Python 2, use codec :

 import codecs f = codecs.open('unicode.rst', encoding='utf-8') for line in f: print repr(line) 

Link: Unicode User Guide


It is possible, however, that your input is incorrectly encoded. In this case, you can try using ftfy package .

ftfy implements several heuristics to correct incorrect unicode encodings. From the docs:

 >>> from ftfy import fix_encoding >>> print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง 
+3
source

Source: https://habr.com/ru/post/1261175/


All Articles