Why did the V8 run out of memory in this situation?

According to node.js docs a node has a limit of 512meg for the 32-bit version and a limit of 1.4gig for the 64-bit version. The restrictions are similar for Chrome AFAICT. (+/- 25%)

So why does this code end up in memory when it never uses more than ~ 424 megabytes of memory?

Here's the code ( Code is nonsense. This question does not concern that the code does this about why the code does not work ).

var lookup = 'superCaliFragilisticExpialidosiousThispartdoesnotrealllymattersd'; function encode (num) { return lookup[num]; } function makeString(uint8) { var output = ''; for (var i = 0, length = uint8.length; i < length; i += 3) { var temp = (uint8[i] << 16) + (uint8[i + 1] << 8) + (uint8[i + 2]); output += encode(temp >> 18 & 0x3F) + encode(temp >> 12 & 0x3F) + encode(temp >> 6 & 0x3F) + encode(temp & 0x3F); } return output; } function test() { var big = new Uint8Array(64 * 1024 * 1024 + 2); // multiple of 3 var str = makeString(big); console.log("big:", big.length); console.log("str:", str.length); } test(); 

As you can see, makeString builds a string by adding 4 characters at a time. In this case, he is going to build a line 89478988 long (180meg) large. Since output is added, the last time characters are added, there will be 2 lines in memory. Old with 89478984 characters and the last with 89478988. The GC must collect any other used memory.

So 64meg (source array) + 180meg * 2 = 424meg. Good under v8 restrictions.

But, if you run the sample, it will fail with out of memory

 <--- Last few GCs ---> 3992 ms: Scavenge 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 0.2 / 0 ms (+ 1.5 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep]. 4450 ms: Mark-sweep 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 458.0 / 0 ms (+ 2.9 ms in 2 steps since start of marking, biggest step 1.5 ms) [last resort gc]. 4909 ms: Mark-sweep 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 458.7 / 0 ms [last resort gc]. $ node foo.js <--- JS stacktrace ---> ==== JS stack trace ========================================= Security context: 0x3a8521e3ac1 <JS Object> 2: makeString(aka makeString) [/Users/gregg/src/foo.js:~6] [pc=0x1f83baf53a3b] (this=0x3a852104189 <undefined>,uint8=0x2ce813b51709 <an Uint8Array with map 0x32f492c0a039>) 3: test(aka test) [/Users/gregg/src/foo.js:19] [pc=0x1f83baf4df7a] (this=0x3a852104189 <undefined>) 4: /* anonymous */ [/Users/gregg/src/foo.js:24] [pc=0x1f83baf4d9e5] (this=0x2ce813b... FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory Abort trap: 6 

Tried both node 4.2.4 and 5.6.0

So the question is: WHY does he run out of memory?

Some things I tried.

  • I tried to combine the pieces

    Instead of adding indefinitely to output , I tried to check for more than a certain size (e.g. 8k). If so, I put it in an array and reset the output to an empty string.

    Thus, output does not exceed 8k. Array takes place 180meg + accounting. So 180meg + 8k is much less than 180meg + 180meg. He still does not have enough memory. Now, at the end of this process, I will join the array, at this point it will use more memory (180 mg + 180 mg + accounting). But, v8 crashes before he gets to this line.

  • I tried changing the encoding to only

     function encode(num) { return 'X'; } 

    In this case, it really ends! So I thought: “Ah! The problem should be with lookup[num] generating a new line for each call? So I tried ...

  • Changed lookup to array of strings

     var lookup = Array.prototype.map.call( 'superCaliFragilisticExpialidosiousThispartdoesnotrealllymattersd', function(c) { return c; }); 

    Still out of memory

Does this sound like a bug in v8? It can't use the GC unused strings in any strange way because of this code, although # 2 vs # 3 is weird because they seem equivalent in terms of memory usage.

Why is there insufficient memory in these v8 situations? (and there is a workaround)

+5
source share
1 answer

TL DR: Your example is a pathological case for one of the internal representations of v8 strings. You can fix this by indexing in output once in a while (information on why below).

First, we can use heapdump to find out what the garbage collector does:

enter image description here

The picture above was taken shortly before the node runs out of memory. As you can see, most things look fine: we see two lines (a very large output and a small piece to be added), three links to the same big array (approximately 64 MB, similar to what we expect), and many smaller items that don't look unusual.

But, one thing stands out: output is a whopping 1.4+ GB. At the time the picture was taken, it was approximately 80 million characters, so ~ 160 MB, assuming 2 bytes per character. How is this possible?

Perhaps this is due to the v8 internal string representation. Quoting mraleph :

There are two types of [lines of v8] (actually more, but only these two are important for the problem under consideration):

  • Flat strings are immutable character arrays
  • cons strings are pairs of strings, the result of concatenation.

If you concatenate a and b, you get the string cons (a, b), which represents the result of concatenation. If you later agree to this, you will get another cons string ((a, b), d).

Indexing into such a "tree-like" string is not O (1), therefore, for acceleration, V8 aligns the string during indexing: copies all characters to a flat string.

So can it be that v8 represents output as a giant tree? One way to check is to make v8 smooth the line (as suggested by mraleph above), for example, by indexing in output at regular intervals inside the for loop:

 if (i % 10000000 === 0) { // We don't do it at each iteration since it relatively expensive. output[0]; } 

And indeed, the program works successfully!

One question remains: why did version 2 work? It seems that in this case, v8 can optimize most string concatenations (all on the right side, which are converted to bitwise operations in a 4-element array).

+2
source

Source: https://habr.com/ru/post/1242833/


All Articles