JSON and escape characters

I have a string that is serialized in JSON in Javascript and then deserialized in Java.

It looks like if the string contains a power symbol, then I get the problem.

I could use some help figuring out who is to blame:

  • Is this an implementation of Spidermonkey 1.8? (this has a built-in JSON implementation)
  • is it google gson ?
  • Is it because I am not doing something right?

Here is what happens in JSDB:

js>s='15\u00f8C' 15Β°C js>JSON.stringify(s) "15Β°C" 

I would expect "15\u00f8C' , which leads me to believe that the JSON implementation of Spidermonkey is not doing the right thing ... except that the syntax of the main JSON page description (is this a specification?) Says char could be

any-Unicode character- except - "- or - \ - or - control character"

so maybe it passes the string along as-is without encoding it as \ u00f8 ... in this case I would think that the problem is with the gson library.

Can anyone help?

I believe my workaround is to use either another JSON library or manually delete the lines after calling JSON.stringify() , but if this is an error, I would like to report an error.

+42
json unicode
04 Feb '11 at 17:39
source share
2 answers

This is not an implementation error. There is no need to avoid U + 00B0. To quote the RFC :

2.5. Lines

String representation is similar to the conventions used in the C family of programming languages. The line begins and ends with a quote tag. All Unicode characters can be enclosed in quotation marks, with the exception of characters that must be escaped: quotation mark, inverse solidus, and control characters (U + 0000 through U + 001F).

Any character can be escaped.

Shielding everything inflates the size of the data (all code points can be represented in four or more bytes in all Unicode conversion formats, while their encoding is six or twelve bytes).

Most likely, you have a text recoding error somewhere in your code, and escaping everything in a subset of ASCII masks the problem. The requirement of the JSON specification is that all data uses Unicode encoding.

+57
Feb 05 2018-11-11T00:
source share

hmm, well here is a workaround:

 function JSON_stringify(s, emit_unicode) { var json = JSON.stringify(s); return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g, function(c) { return '\\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4); } ); } 

test case:

 js>s='15\u00f8C 3\u0111'; 15Β°C 3β—„ js>JSON_stringify(s, true) "15Β°C 3β—„" js>JSON_stringify(s, false) "15\u00f8C 3\u0111" 
+64
04 Feb '11 at 17:48
source share



All Articles