How is a cryptographically hash JSON object?

The next question is more complicated than it might seem at first glance.

Suppose I have an arbitrary JSON object that can contain any amount of data, including other nested JSON objects. What I want is a cryptographic hash / digest of JSON data, regardless of the actual JSON formatting (for example: ignoring newlines and the difference between JSON tokens).

The last part is a prerequisite, as JSON will be generated / read by many (de) serializers on several different platforms. I know at least one JSON library for Java that completely removes formatting when reading data during deserialization. This way it will break the hash.

The suggestion of arbitrary data also complicates the situation, since it prevents me from accepting known fields in a given order and concatenating them to hashing (think how the hashCode () Java non-cryptographic method works).

Finally, hashing the entire JSON string as part of the bytes (before deserialization) is also undesirable since there are fields in JSON that should be ignored when computing the hash.

I'm not sure there is a good solution to this problem, but I welcome any approaches or thoughts =)

+48
json cryptography canonicalization
Jan 12 2018-11-11T00:
source share
7 answers

The problem is common when computing hashes for any data format where flexibility is allowed. To solve this problem, you need to canonize the view.

For example, the OAuth1.0a protocol, which is used by Twitter and other services for authentication, requires a secure hash of the request message. To calculate the hash, OAuth1.0a says you need to first alphabetically transfer the fields, split them into new lines, delete the field names (which are well known), and use empty lines for empty values. A signature or hash is calculated from the results of this canonization.

XML DSIG works the same way - you need to canonicalize XML before signing it. There is a proposed W3 standard covering this because it is such a fundamental requirement for signing. Some call it c14n.

I do not know the canonicalization standard for json. It is worth exploring.

If not, you can, of course, establish an agreement to use your specific application. A reasonable start might be:

  • lexicographically sort properties by name
  • double quotes used for all names
  • double quotes used for all string values
  • a space or one space between names and a colon and between a colon and a value
  • no spaces between values ​​and next comma
  • all other spaces are collapsed either into one space or nothing - select one
  • exclude any properties that you do not want to sign (one example is a property that contains the signature itself)
  • sign the result with your chosen algorithm

You may also need to think about how to pass this signature into a JSON object - perhaps set a well-known property name, such as "nichols-hmac" or something else that gets a base64 encoded version of the hash. This property should be explicitly excluded using the hash algorithm. Then any JSON receiver will be able to verify the hash.

The canonized view does not have to be the view that you pass in the application. It should be easily created to fit any JSON object.

+42
Jan 12 '11 at 15:38
source share

Instead of inventing your own JSON normalization / canonicalization, you can use bencode . Semantically, this is the same as JSON (the composition of numbers, strings, lists, and dicts), but with the unique encoding property, which is necessary for cryptographic hashing.

bencode is used as a torrent file format, each bittorrent client contains an implementation.

+5
Jan 12 '11 at 15:54
source share

This is the same issue as problems with S / MIME signatures and XML signatures. That is, there are several equivalent representations of the data that must be signed.

For example, in JSON:

{ "Name1": "Value1", "Name2": "Value2" } 

against.

 { "Name1": "Value\u0031", "Name2": "Value\u0032" } 

Or depending on your application, this may even be equivalent:

 { "Name1": "Value\u0031", "Name2": "Value\u0032", "Optional": null } 

Canonicalization may solve this problem, but it is a problem that you do not need at all.

A simple solution, if you have control over the specification, is to wrap the object in some kind of container to protect it from being converted to an "equivalent" but different representation.

those. avoid the problem by not signing a β€œlogical” object, but instead by signing a specific serialized view.

For example, JSON β†’ UTF-8 Text β†’ Bytes objects. Sign the bytes as bytes , then pass them as bytes , for example. base64 encoded. Since you are signing bytes, differences, such as whitespace, are part of the signature.

Instead of this:

 { "JSONContent": { "Name1": "Value1", "Name2": "Value2" }, "Signature": "asdflkajsdrliuejadceaageaetge=" } 

Just do the following:

 { "Base64JSONContent": "eyAgIk5hbWUxIjogIlZhbHVlMSIsICJOYW1lMiI6ICJWYWx1ZTIiIH0s", "Signature": "asdflkajsdrliuejadceaageaetge=" } 

those. do not sign JSON, sign bytes of encoded JSON.

Yes, this means that the signature is no longer transparent.

+4
Dec 06 '16 at 14:17
source share

JSON-LD can perform normalization.

You will need to define your context.

+2
Jan 31 '15 at 8:28
source share

I would do all the fields in a given order (for example, in alphabetical order). Why does arbitrary data matter? You can simply iterate over properties (reflection ala).

As an alternative, I would consider converting the json source string to some well-defined canonical form (removed all super-dense formatting) - and hashing this.

0
Jan 12 '11 at 15:37
source share

We ran into a simple JSON-encoded payload hashing problem. In our case, we use the following methodology:

  1. Convert data to a JSON object;
  2. Encode base64 JSON payload
  3. The message digest (HMAC) of the base64 generated payload.
  4. Base64 payload transfer.

Benefits of using this solution:

  1. Base64 will produce the same output for a given payload.
  2. Since the resulting signature will be obtained directly from the base64 encoded payload, and since the base64 payload will be exchanged between the endpoints, we will be sure that the signature and the payload will be preserved.
  3. This solution solves problems caused by differences in the encoding of special characters.

disadvantages

  1. Payload encoding / decoding can add overhead
  2. Base64 encoded data is usually 30 +% larger than the original payload.
0
Apr 05 '18 at 2:08
source share

RFC 7638: JSON web key fingerprint (JWK) includes canonicalization type. Although RFC7638 expects a limited set of members, we can apply the same calculation to any participant.

https://tools.ietf.org/html/rfc7638#section-3

0
Dec 22 '18 at 2:24
source share



All Articles