Python - How can I find a string in a Unicode character, which is a variable?

It works

s = 'jiā' s.find(u'\u0101') 

How do I do something like this:

 s = 'jiā' zzz = '\u0101' s.find(zzz) 

Since I am using a variable now, how to specify the string represented by this variable, is it Unicode?

+6
source share
3 answers

Since I am using a variable now, how to specify the string represented by this variable, is it Unicode?

Defining it as a Unicode string in the first place.

 zzz = u"foo" 

Or, if you already have a string in some other encoding, by converting it to Unicode (the source encoding should be specified if the string is not ASCII).

 zzz = unicode(zzz, encoding="latin1") 

Or using Python 3 where all the lines are Unicode.

+5
source

zzz , as defined in your post, is a simple str object, not a unicode object, so there is no way to indicate that this is what it really is not. You can convert the str object to a unicode object, however, specifying the encoding:

 s.find(zzz.decode("utf-8")) 

Substitution of utf-8 using any encoding into which the string is encoded.

Please note that in your example

 zzz = '\u0101' 

zzz is a simple string of length 6. After this, there is no easy way to fix this invalid string literal, except for hacks along the strings

 ast.literal_eval("u'" + zzz + "'") 
+2
source

In some cases (I ignore when), you will also have to decode the line in which you look:

 s.decode("utf-8").find(u"\u0101") 
0
source

Source: https://habr.com/ru/post/901308/


All Articles