You can decode these Unicode escape sequences with .decode('unicode-escape') . However .decode is a bytes method, so if these sequences are text, not bytes, you first need to encode them in bytes. In addition, you can (possibly) open the CSV file in binary mode to read these sequences as bytes , and not as text strings.
Just for fun, I also use unicodedata to get the names of these emojis.
import unicodedata as ud emojis = [ '\\U0001F600', '\\U0001F601', '\\U0001F602', '\\U0001F923', ] for u in emojis: s = u.encode('ASCII').decode('unicode-escape') print(u, ud.name(s), s)
Output
\U0001F600 GRINNING FACE π \U0001F601 GRINNING FACE WITH SMILING EYES π \U0001F602 FACE WITH TEARS OF JOY π \U0001F923 ROLLING ON THE FLOOR LAUGHING π€£
This should be much faster than using ast.literal_eval . And if you read the data in binary mode, it will be even faster, since it avoids the initial decoding step when reading the file, and also eliminates the call to .encode('ASCII') .
You can make decoding a little more reliable using
u.encode('Latin1').decode('unicode-escape')
but this is not necessary for your emoji data. And, as I said earlier, it would be even better if you opened the file in binary mode to avoid the need to encode it.
source share