Python 3.4: str: AttributeError: object 'str' does not have attribute 'decode

I have this piece of function code that replaces incorrectly encoded foreign characters from a string:

s = "String from an old database with weird mixed encodings" s = str(bytes(odbc_str.strip(), 'cp1252')) s = s.replace('\\x82', 'é') s = s.replace('\\x8a', 'è') (...) print(s) # b"String from an old database with weird mixed encodings" 

I need a real string, not bytes. But when I want to decode them, I have an exception:

 s = "String from an old database with weird mixed encodings" s = str(bytes(odbc_str.strip(), 'cp1252')) s = s.replace('\\x82', 'é') s = s.replace('\\x8a', 'è') (...) print(s.decode("utf-8")) # AttributeError: 'str' object has no attribute 'decode' 
  • Do you know why s here are bytes?
  • Why can't I decode it to a real line?
  • Do you know how to do this in a clean way? (today I return s [2:] [: - 1]. I work, but it is very ugly, and I would like to understand this behavior)

Thanks in advance!

EDIT:

pypyodbc in python3 uses all unicode by default. It confused me. When connected, you can tell him to use ANSI.

 con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False) 

Then I can convert the returned elements to cp850, which is the initial code page of the database.

 str(odbc_str, "cp850", "replace") 

There is no need to manually replace each special character. Thanks a lot pepr

+5
source share
1 answer

The printable b"String from an old database with weird mixed encodings" not a representation of the contents of the string. This is the value of the contents of the string. Since you did not pass the str() encoding argument ... (see Document https://docs.python.org/3.4/library/stdtypes.html#str )

If no encoding or error is specified, str(object) returns object.__str__() , which is an “informal” or beautifully printed string representation of the object. For string objects, this is the string itself. If the object does not have the __str__() method, then str() returns to return repr(object) .

This is what happened in your case. b" actually two characters that are part of the contents of the string. You can also try:

 s1 = 'String from an old database with weird mixed encodings' print(type(s1), repr(s1)) by = bytes(s1, 'cp1252') print(type(by), repr(by)) s2 = str(by) print(type(s2), repr(s2)) 

and he prints:

 <class 'str'> 'String from an old database with weird mixed encodings' <class 'bytes'> b'String from an old database with weird mixed encodings' <class 'str'> "b'String from an old database with weird mixed encodings'" 

This is why s[2:][:-1] works for you.

If you think more about this, then (in my opinion), or you want to get bytes or bytearray from the database (if possible) and fix the bytes (see bytes.translate https://docs.python.org/3.4/library /stdtypes.html?highlight=translate#bytes.translate ), or you will successfully receive a string (fortunate that there was no exception when building this string), and you want to replace the incorrect characters with the correct characters (see also str.translate() https : //docs.python.org/3.4/library/stdtypes.html? highlight = translate # str.translate ).

ODBC may have misused internal encoding. (That is, the contents of the database may be correct, but it was misinterpreted by ODBC, and you cannot tell ODBC what the correct encoding is.) Then you want to encode the string back to bytes using this incorrect encoding, and then decode the bytes using correct encoding.

+2
source

Source: https://habr.com/ru/post/1203290/


All Articles