How to remove unicode in a list

Question

How to remove unicode in a list

I want to remove a string from Unicode from a list such as airports [U'KATL 'u'KCID']

expected output

[KATL, KCID]

Followed the link below

I tried one of the solutions

my_list = ['this \ n', 'is \ n', 'a \ n', 'list \ n', 'of \ n', 'words \ n']
map (str.strip, my_list) ['this', 'is', 'a', 'list', 'of', 'words']

received the following error:

TypeError: descriptor 'strip' requires object 'str', but received 'unicode'

+5

python unicode

Hariom singh Jul 27 '17 at 14:02

source share

3 answers

The simplest option is a listcomp list:

 [s.strip() for s in my_list]

If you want to use a map, I would use a lambda to get an object of my own personal strip function, rather than requiring it to be a strip that was delivered by one particular library.

 map(lambda s: s.strip(), my_list)

+1

Jon kiparsky Jul 27 '17 at 14:23

source share

The only reliable conversion of a unicode string to a byte string is to encode it into an acceptable encoding (the most common are ascii, Latin1, and UTF8). By definition, UTF8 can encode any Unicode character, but in the string you will find non ascii characters, and the size in the byte will no longer be the number of (unicode) characters. Latin1 can represent most characters in Western European languages with a ratio of 1 byte per character, and ascii is a set of characters that are always correctly represented.

If you want to process strings containing characters that are not represented in the selected encoding, you can use the errors='ignore' parameter to simply delete them or errors='replace' to replace them with a replacement character, often ? .

So, if I understand your requirement correctly, you can translate the Unicode string list to the byte string list with:

 [ x.encode('ascii', errors='replace') for x in my_list ]

+1

Serge Ballesta Jul 27 '17 at 14:29

source share

randomir · Accepted Answer · 2017-07-27T14:08:22+0000

First, I highly recommend you switch to Python 3, which treats Unicode strings as first-class citizens (all strings are Unicode strings, but they are called str ).

But if you need to make it work in Python 2, you can remove unicode strings with unicode.strip (if your strings are true Unicode strings):

 >>> lst = [u'KATL\n', u'KCID\n'] >>> map(unicode.strip, lst) [u'KATL', u'KCID']

If your unicode strings are limited to a subset of ASCII, you can convert them to str with:

 >>> lst = [u'KATL', u'KCID'] >>> map(str, lst) ['KATL', 'KCID']

Note that this conversion will not be performed for non-ASCII strings. To encode Unicode codes as str (a string of bytes), you need to select your encoding algorithm (usually UTF-8) and use the .encode() method for your strings:

 >>> lst = [u'KATL', u'KCID'] >>> map(lambda x: x.encode('utf-8'), lst) ['KATL', 'KCID']

How to remove unicode in a list

More articles: