String Encoding / Decoding Error - Missing Character from the End

Question

String Encoding / Decoding Error - Missing Character from the End

I have a column of type NVARCHAR in my database. I cannot convert the contents of this column to a regular string in my code. (I am using pyodbc to connect to the database).

 # This unicode string is returned by the database >>> my_string = u'\u4157\u4347\u6e65\u6574\u2d72\u3430\u3931\u3530\u3731\u3539\u3533\u3631\u3630\u3530\u3330\u322d\u3130\u3036\u3036\u3135\u3432\u3538\u2d37\u3134\u3039\u352d' # prints something in chineese >>> print my_string䅗䍇湥整⵲㐰㤱㔰㜱㔹㔳㘱㘰㔰㌰㈭㄰〶〶ㄵ㐲㔸ⴷㄴ〹㔭

The closest I left is its encoding to utf-16 like:

 >>> my_string.encode('utf-16') '\xff\xfeWAGCenter-04190517953516060503-20160605124857-4190-5' >>> print my_string.encode('utf-16')   WAGCenter-04190517953516060503-20160605124857-4190-5

But the actual value that I need is according to the store of values in the database:

 WAGCenter-04190517953516060503-20160605124857-4190-51

I tried with encoding utf-8 , utf-16 , ascii , utf-32 , but nothing worked.

Does anyone have an idea regarding what I don't see? And how to get the desired result from my_string .

Change When converting it to utf-16-le I can remove unnecessary characters from the beginning, but still one character is missing from the end

 >>> print t.encode('utf-16-le') WAGCenter-04190517953516060503-20160605124857-4190-5

When trying to use some other columns it works. What could be causing this intermittent problem?

+5

python python-2.7 encode pyodbc netezza

user7001260 Oct 11 '16 at 13:29

source share

2 answers

The problem was that I used UTF-16 in my odbcinst.ini file, where I had to use the UTF-8 character encoding format.

I used to change it as an OPTION parameter when connecting to PyODBC . But later, changing it in the odbcinst.ini file odbcinst.ini problem.

+1

user7001260 Nov 23 '16 at 23:46

source share

Serge Ballesta · Accepted Answer · 2016-10-11T15:37:19+0000

You have a serious problem in defining your database, in how you store values in it, or in how you read values from it. I can only explain what you see, but neither why, nor how to fix it without:

type of database
way to enter values into it
way to extract values to get pseudo unicode string
actual content if you use direct (native) access to the database

What you get is an ASCII string, where 8-bit characters are grouped in pairs to create 16-bit Unicode characters in a small trailing order. Since the expected line has an odd number of characters, the last character (lossless) was lost in translation, because the original line ends with u'\352d' , where 0x2d is the ASCII code for '-' and 0x35 for '5' . Demo video:

 def cvt(ustring): l = [] for uc in ustring: l.append(chr(ord(uc) & 0xFF)) # low order byte l.append(chr((ord(uc) >> 8) & 0xFF)) # high order byte return ''.join(l) cvt(my_string) 'WAGCenter-04190517953516060503-20160605124857-4190-5'

String Encoding / Decoding Error - Missing Character from the End

More articles: