I have this piece of function code that replaces incorrectly encoded foreign characters from a string:
s = "String from an old database with weird mixed encodings" s = str(bytes(odbc_str.strip(), 'cp1252')) s = s.replace('\\x82', 'é') s = s.replace('\\x8a', 'è') (...) print(s)
I need a real string, not bytes. But when I want to decode them, I have an exception:
s = "String from an old database with weird mixed encodings" s = str(bytes(odbc_str.strip(), 'cp1252')) s = s.replace('\\x82', 'é') s = s.replace('\\x8a', 'è') (...) print(s.decode("utf-8")) # AttributeError: 'str' object has no attribute 'decode'
- Do you know why s here are bytes?
- Why can't I decode it to a real line?
- Do you know how to do this in a clean way? (today I return s [2:] [: - 1]. I work, but it is very ugly, and I would like to understand this behavior)
Thanks in advance!
EDIT:
pypyodbc in python3 uses all unicode by default. It confused me. When connected, you can tell him to use ANSI.
con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)
Then I can convert the returned elements to cp850, which is the initial code page of the database.
str(odbc_str, "cp850", "replace")
There is no need to manually replace each special character. Thanks a lot pepr
Romu source share