Stream highWaterMark misunderstanding

Question

Stream highWaterMark misunderstanding

After reading some code on Github, it looks like I misunderstood how the highWaterMark concept highWaterMark .

In the case of a recordable stream that would record a large amount of data as quickly as possible, here is an idea I had in the life cycle:

1) Until the highWaterMark limit is reached, the stream is able to buffer and write data.

2) If the highWaterMark limit highWaterMark reached, the stream can no longer buffer, so the #write method returns false, so you know that what you tried to write will not write (never).

3) As soon as the thread emits a drain event, it means that the buffer has been cleared and you can write again from where you got "rejected".

It was clear and simple in my opinion, but it seems that it is not so (in step 2), is the data you are trying to write really “rejected” when the #write method returns false? Or is it buffered (or something else)?

Sorry for the basic question, but I have to be sure!

+5

node.js

Ludo Mar 04 '16 at 16:48

source share

3 answers

Is the data that you are trying to write really "rejected" when the #write method returns false? Or is it buffered (or something else)?

Data is buffered. However, excessive write() calls, preventing the buffer from being drained, will cause high memory usage, poor garbage collection performance, and may even cause Node.js to crash with the error Allocation failed - JavaScript heap out of memory . See this related question:

Node: fs write () does not write the inner loop. Why not?

For reference, here are some important information about highWaterMark and back pressure from current documents (v8.4.0):

`writable.write()`

The return value is true if the internal buffer is less than the highWaterMark configured when the stream was created after chunk input. If false returned, further attempts to write data to the stream should stop until the 'drain' event is 'drain' .
Until the stream merges, write() calls will buffer chunk and return false . After all buffered pieces are currently merged (accepted for delivery by the operating system), the 'drain' event will be 'drain' . Recommended by , which after write() returns false , no more fragments will be written until the 'drain' event is 'drain' . When calling write() on a thread that does not merge, it is allowed, Node.js will buffer all recorded fragments until the maximum memory usage occurs, and at this point it will be unconditionally canceled . Even before it stops, using high memory will result in poor garbage collection performance and high RSS (which usually does not return to the system, even after the memory is no longer required).

Reserve pressure in flows

In any case, when the data buffer has exceeded the highWaterMark value or the write queue is currently busy, .write() will return false .
When false returned, the backpressure system starts. It suspends the Readable incoming stream from sending any data and waits until the consumer is ready again. Once the data buffer is empty, the .drain() event will occur and resume the flow of incoming data.
Once the queue is complete, backpressure will allow data to be sent again. The memory space that was used will be freed up and prepared for the next batch of data.

  +-------------------+ +=================+ | Writable Stream +---------> .write(chunk) | +-------------------+ +=======+=========+ | +------------------v---------+ +-> if (!chunk) | Is this chunk too big? | | emit .end(); | Is the queue busy? | +-> else +-------+----------------+---+ | emit .write(); | | ^ +--v---+ +---v---+ ^-----------------------------------< No | | Yes | +------+ +---v---+ | emit .pause(); +=================+ | ^-----------------------+ return false; <-----+---+ +=================+ | | when queue is empty +============+ | ^-----------------------< Buffering | | | |============| | +> emit .drain(); | ^Buffer^ | | +> emit .resume(); +------------+ | | ^Buffer^ | | +------------+ add chunk to queue | | <---^---------------------< +============+

+3

Tachyonvortex Aug 27 '17 at 14:15

source share

Any data you write to the stream will eventually be written, even if the call returned false (and buffered in memory until then).

The highWaterMark option gives you some control over the amount of buffer memory used. After you have written more than the specified amount, write will return false to give you the option to stop writing. You don’t have to: if you don’t stop, the data will not fall, you just end up using more memory (overwriting the data will lead to duplication). And, as you already mentioned, you can listen to the 'drain' event to find out when it is time to record again.

+2

mtth Mar 04 '16 at 17:03

source share

Matt harrison · Accepted Answer · 2016-03-04T17:06:53+0000

2) If the HighWaterMark limit is reached, the stream cannot buffer anymore, so the #write method returns false, so that you know that you tried to write not to write (never).

This is not true, the data is still buffered, the stream does not lose it. But you must stop writing at this moment. This allows the spread of back pressure.

Your question is solved in writable.write(chunk[, encoding][, callback]) docs:

This return value is strictly recommended. You MAY keep writing even if it returns false . However, the records will be buffered in memory, so it is best not to do this excessively. Instead, wait for 'drain' before writing extra data.

Stream highWaterMark misunderstanding

writable.write()

Reserve pressure in flows

More articles:

`writable.write()`