Python - replace non-ascii character in string (")

I need to replace the "" character in the string with a space, but I still get the error message. This is the code I'm using:

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup # other code soup = BeautifulSoup(data, 'lxml') mystring = soup.find('a').text.replace(' »','') 

UnicodeEncodeError: codec 'ascii' cannot encode character u '\ xbb' at position 13: serial number not in range (128)

But if I check it with this other script:

 # -*- coding: utf-8 -*- a = "hi »" b = a.replace('»','') 

It works. Why is this?

+6
source share
2 answers

To replace the contents of a string using the str.replace() method; you need to decode the string first, then replace the text and encode it back to the source text:

 >>> a = "hi »" >>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8') 'hi ' 

You can also use the following regular expression to remove all non-ascii characters from a string:

 >>> import re >>> re.sub(r'[^\x00-\x7f]',r'', 'hi »') 'hi ' 
+8
source

@Moinuddin Quadri's answer is better suited to your use case, but in general a simple way to remove non-ASCII characters from a given string is to do the following:

 # the characters '¡' and '¢' are non-ASCII string = "hello, my name is ¢arl... ¡Hola!" all_ascii = ''.join(char for char in string if ord(char) < 128) 

This leads to:

 >>> print(all_ascii) "hello, my name is arl... Hola!" 

You can also do this:

 ''.join(filter(lambda c: ord(c) < 128, string)) 

But this is about 30% slower than the char for char ... approach.

+2
source

Source: https://habr.com/ru/post/1012739/


All Articles