Large Array Compression

I have a javascript application that sends a lot of digital data over wires. This data is then stored in a database. I have problems with size (too much bandwidth, too big database). Now I am ready to sacrifice some performance for compression.

I was thinking about implementing base number 62.String (62) and parseInt (compressed, 62). This will certainly reduce the size of the data, but before I go any further and do it, I thought that I put it to people, because I know that there must be some kind of external core that I did not consider.

Key Features: - Compressing a large number of arrays into strings for JSONP transmission (so I think that UTF is missing) - Be relatively fast, see that I do not expect the same performance as I have now, but I also do not want gzip compression .

We will be very grateful for any ideas.

thanks

Guido Tapia

+4
source share
4 answers

Another way to do this could be encoding binary types, such as signed / unsigned ints, and manual decoding, as at http://snippets.dzone.com/posts/show/685 which requires server-side code to create binary data.

You could then compress huffman or something similar like RLE (see http://rosettacode.org/wiki/Run-length_encoding#JavaScript for implementation, although it may have some problems in IE without changing) for further compression data.

EDIT : In addition, you can convert the numbers directly to the base (radix) in the Unencoded URI character range (see http://en.wikipedia.org/wiki/Percent-encoding ), which should work well if many of the numbers are more than 2 digits. I converted the code to http://code.activestate.com/recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python for this.

It does not currently handle float, but it can be done quite easily:

function get_map(s) { d = {} for (var i=0; i<s.length; i++) { d[s.charAt(i)] = i} d.length = s.length d._s = s return d} var separate_with = '~'; var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P var base10 = get_map('0123456789') // UNCOMMENT ME for length/speed testing in a wider base! // You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P /*var encodable = '' for (var i=1; i<128; i++) { encodable += String.fromCharCode(i) } encodable = get_map(encodable)*/ function baseconvert(number, fromdigits, todigits) { var number = String(number) if (number.charAt(0) == '-') { number = number.slice(1, number.length) neg=1} else { neg=0} // make an integer out of the number var x = 0 for (var i=0; i<number.length; i++) { var digit = number.charAt(i) x = x*fromdigits.length + fromdigits[digit] } // create the result in base 'todigits.length' res = "" while (x>0) { remainder = x % todigits.length res = todigits._s.charAt(remainder) + res x = parseInt(x/todigits.length) } if (neg) res = "-"+res return res } function encodeNums(L) { var r = [] for (var i=0; i<L.length; i++) { r.push(baseconvert(L[i], base10, encodable)) } return r.join(separate_with) } function decodeNums(s) { var r = [] var s = s.split(separate_with) for (var i=0; i<s.length; i++) { r.push(parseInt(baseconvert(s[i], encodable, base10))) } return r } var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543] alert(encodeNums(test)) alert(decodeNums(encodeNums(test))) 
+3
source

Well, as mentioned in the accepted answer to this question , the jsolait library has an implementation of LZW compression in JavaScript ...

+2
source

Functions

  • Use js library (see answer from Josh), but watch out for script timeouts
  • Use some kind of active or silver bootloader that also does zip
  • Use java plugin
  • Use bootloader using flash
0
source

Now I'm thinking about the idea of โ€‹โ€‹encoding the length of a number in the number itself. I have not yet improved this algorithm, but will publish it after completion. But roughly this is what I'm currently trying to achieve:

Borders

  • Loss of accuracy is allowed (+ - 3)
  • The maximum number will be 3200
  • Min is 0 (so no negatives)
  • No decimal numbers - floats

So now, given my maximum allowable number, I know that the length of the encoded digit in base 62 will have a maximum length of 2. Thus, any encoded number has a length of 1 or 2 characters. Awesome. So now I'm going to make the number odd or even dependent if it's 1 or 2 characters (remember that I can handle the loss of precession). This eliminates the need for delimiters.

Now I see about 70% -80% compression with this, it is very buggy at the moment, but I am delighted with this, so the post encourages discussion of this methodology.

0
source

Source: https://habr.com/ru/post/1306308/


All Articles