How to prevent JSON parsing from crashing if there are illogical characters in JSON?

Due to some communication errors, I sometimes get JSON strings with some illegal characters: "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"} 1"

These illegal characters cause my JSON parser to break. I am using the RapidJSON JSON parser (C / C ++). Could you tell me if there is a way to filter out these unwanted characters from the string, and also check the integrity of the json string.

+5
source share
5 answers

This is not a parser error. The parser checks for trailing characters before the zero delimiter is spaces. And it returns an error code when an error occurs. But if there is no null limiter, this can lead to a segmentation error similar to strlen() .

Newer versions of RapidJSON have kParseStopWhenDoneFlag . When it is turned on, the parser will stop reading trailing characters after the full JSON value. For instance.

 Document d; const char* s = "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"}    1"; d.Parse<kParseStopWhenDoneFlag>(s); assert(!d.HasParseError()); 

Using this flag, the parser will stop after reading } without an error message.

This is not yet documented in the manual. Please refer to the discussion at https://github.com/miloyip/rapidjson/pull/83

+5
source

I think you should consider reprogramming your own preprocessing function, which goes through each character of the JSON string, looking for characters that are not part of your legal set, and either removes or replaces them with a space. Then pass the newly restored string forward to RapidJSON.

It is probably better to detect when you had problems with the message in the first place (and therefore, JSON may be incomplete and / or incorrect) and throw away and repeat the entire session, and not "correct" the data as you want here, which solves you a short-term problem (program failure), but can easily generate data inconsistencies and other more subtle and difficult to diagnose problems.

Also, if you see mostly bad data at the end of the line, like this, I think you should carefully check that your problem is actually related to comm - the case you give here is more like a string buffer that was not correctly completed and has extra garbage (uninitialized memory) after the end of the line - perhaps you expected C ++ to clear (set to zero) the allocated buffer?

+2
source

Enter the error report. The JSON parser should accept any incoming data that you throw on it and return the corresponding error message. If it crashes, it sounds like a vulnerability that could allow your application to attack hackers. It is probably best to find another parser.

JSON data should never be modified by the receiver to make it work. It should be taken as it is, and if this is invalid data, then it should be refused. If there are "communication errors" associated with errors in the code, correct the error. If they are caused by server errors, contact the person who wrote the server code. If they are genuine transmission errors, how do you know that you do not have changes that support JSON, for example, the payment amount has changed from $ 100 to $ 900?

+2
source

You can list the allowed characters in a string and verify that each character of your json stream is in a resolved_string. Example:

 std::string allowed = "abcdefghijklmnopqrstuvwxyz0123456789.,{}[]\""; std::string json = "{\"bar\":\"foo\",\"blah\":25}"; for(unsigned long int i = 0; i < json.length(); ++i) if(allowed.find(json[i]) == std::string::npos) throw IllegalJsonChar(json[i]); 
+1
source

If your system is POSIX.1 compliant, you can use iconv .

You can either make a system call (Linux / iOS, etc.):

 iconv -f utf-8 -t utf-8 -c file.txt 

This will convert from utf-8 to utf-8, skipping any invalid characters.

You can do the same using iconv (3) with a bit of programming.

+1
source

Source: https://habr.com/ru/post/1206331/


All Articles