Python encoded utf-8 string \ xc4 \ x91 in Java

How to get the correct Java string from the Python string 'Oslobo \ xc4 \ x91enja'? How to decode it? I tried, I think, everything, everywhere, everywhere, I was stuck for 2 days with this problem. Please, help!

Here is a Python web service method that returns the JSON from which its Java client with Google Gson parses.

def list_of_suggestions(entry): input = entry.encode('utf-8') """Returns list of suggestions from auto-complete search""" json_result = { 'suggestions': [] } resp = urllib2.urlopen('https://maps.googleapis.com/maps/api/place/autocomplete/json?input=' + urllib2.quote(input) + '&location=45.268605,19.852924&radius=3000&components=country:rs&sensor=false&key=blahblahblahblah') # make json object from response json_resp = json.loads(resp.read()) if json_resp['status'] == u'OK': for pred in json_resp['predictions']: if pred['description'].find('Novi Sad') != -1 or pred['description'].find(u' ') != -1: obj = {} obj['name'] = pred['description'].encode('utf-8').encode('string-escape') obj['reference'] = pred['reference'].encode('utf-8').encode('string-escape') json_result['suggestions'].append(obj) return str(json_result) 

Here is the solution on the Java client

 private String python2JavaStr(String pythonStr) throws UnsupportedEncodingException { int charValue; byte[] bytes = pythonStr.getBytes(); ByteBuffer decodedBytes = ByteBuffer.allocate(pythonStr.length()); for (int i = 0; i < bytes.length; i++) { if (bytes[i] == '\\' && bytes[i + 1] == 'x') { // \xc4 => c4 => 196 charValue = Integer.parseInt(pythonStr.substring(i + 2, i + 4), 16); decodedBytes.put((byte) charValue); i += 3; } else decodedBytes.put(bytes[i]); } return new String(decodedBytes.array(), "UTF-8"); } 
+6
source share
2 answers

You are returning a string version of the python data structure.

Return the actual JSON response; leave the values ​​as Unicode:

 if json_resp['status'] == u'OK': for pred in json_resp['predictions']: desc = pred['description'] if u'Novi Sad' in desc or u' ' in desc: obj = { 'name': pred['description'], 'reference': pred['reference'] } json_result['suggestions'].append(obj) return json.dumps(json_result) 

Now Java should not interpret Python escape codes and instead can parse valid JSON.

+2
source

Python displays Unicode characters by converting their UTF-8 bytes to a series of \ xVV values, where VV is the hexadecimal value of the byte. This is very different from java unicode escapes, which are just one uVVVV per character, where VVVV is UTF-16 hex encoded.

Consider:

 \xc4\x91 

In decimal form, these hexadecimal values ​​are:

 196 145 

then (in Java):

 byte[] bytes = { (byte) 196, (byte) 145 }; System.out.println("result: " + new String(bytes, "UTF-8")); 

prints:

 result: Δ‘ 
+1
source

Source: https://habr.com/ru/post/953062/


All Articles