JavaScript-friendly binary safe data format (not JSON or XML)

First of all: JSON and XML are not an option in this particular case, please do not offer them. If this facilitates the acceptance of this fact, imagine that I intend to reinvent the wheel for self-education.

Return to point:

I need to create a binary-safe data format for encoding some datagrams that I send to a specific dumb server that I write (in C, if that matters).

To simplify the matter, let's say that I send only numbers, strings and arrays.

An important fact: the server does not know (and should not) know anything about Unicode, etc. It treats all strings as binary drops (and never looks at them).

The format that I originally developed is as follows:

  • Datagram: <Number:size>\n<Value1>...<ValueN>
  • Value:
    • Number: N\n<Value>\n
    • String: S\n<Number:size-in-bytes>\n<bytes>\n
    • Array: A\n<Number:size>\n<Value0>...<ValueN>

Example:

 [ 1, "foo", [] ] 

Serialized as follows:

  1 ;  number of items in datagram
 A;  - array -
 3;  number of items in array
 N;  - number -
 1 ;  number value
 S;  - string -
 3;  string size in bytes
 foo;  string bytes
 A;  - array -
 0;  number of items in array

The problem is that I cannot reliably get the size of the string in bytes in JavaScript.

So, the question arises: how to change the format, so the string can be saved in JS and loaded into C carefully.

I do not want to add Unicode support to the server.

And I don’t really want to decode the lines on the server (say, from base64 or just to unescape \ xNN sequences) - this will require working with dynamic line buffers, which, given how dumb the server is, is not so desirable ...

Any clues?

+4
source share
1 answer

It seems that reading UTF-8 in plain C is, after all, not so scary . Therefore, I am expanding the protocol for handling UTF-8 strings natively. (But appreciate the answer to this question in its current form.)

+1
source

Source: https://habr.com/ru/post/1346279/


All Articles