UnicodeEncodeError: codec 'ascii' cannot encode character u '\ u2019' at position 47: serial number not in range (128)

I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found in stackoverflow and other forums for handling coding errors that my script throws. My script reads data from all tables in the original MySQL DB, writes it to the python StringIO.StringIO object, and then loads that data from the StringIO object into the Postgres database (which apparently is in UTF-8 encoding format. Found this by looking at Properties - Database Definition in pgadmin) using the copy_from command of the psycopg2 library.

I found out that in my original MySQL database there are several tables encoded in latin1_swedish_ci and others in utf_8 encoding format (found this from TABLE_COLLATION in information_schema.tables).

I wrote all this code at the top of my Python script based on my research on the Internet.

 db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) db_conn.set_character_set('utf8') db_conn_cursor = db_conn.cursor() db_conn_cursor.execute('SET NAMES utf8;') db_conn_cursor.execute('SET CHARACTER SET utf8;') db_conn_cursor.execute('SET character_set_connection=utf8;') 

I still get a UnicodeEncodeError below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value ,

 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128) 

I wrote the following line of code to clear the cells in each table of the original MySQL database when writing to a StringIO object.

 cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value 

Please, help.

+6
source share
1 answer

str(cell) trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is a cell?

If cell is a Unicode string, just do cell.encode("utf8") and it will return a byte string encoded as utf 8

... or really iirc. If you pass mysql unicode, then the database will automatically convert it to utf8 ...

You can also try

 cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") 

or just use a third-party library. There is a good one that will fix the text for you.

+10
source

Source: https://habr.com/ru/post/977442/


All Articles