Node.js async parallel - what are the consequences?

There is a code

async.series(tasks, function (err) { return callback ({message: 'tasks execution error', error: err}); }); 

where tasks is an array of functions, each of which generates an HTTP request (using the request module) and calls the MongoDB API to store data (for the MongoHQ instance).

With my current input (task execution ~ 200) is required

  [normal mode] collection cycle: 1356.843 sec. (22.61405 mins.) 

But just try changing from series to parallel , it gives great benefits. Almost the same number of tasks is performed in ~30 secs instead of ~23 mins .

But, knowing that nothing is free, I'm trying to understand what are the consequences of this change? Can I say that the number of open sockets will be much higher, more memory consumption, more gets to database servers?

The machine in which I run the code is only 1 GB of Ubuntu RAM, so I have the application freeze there once, can this be caused by a lack of resources?

+6
source share
4 answers

Your intuition is true that parallelism does not come for free, but you can certainly pay for it.

Using a load testing module (or a set of modules) like nodeload , you can determine how this parallel operation affects your server to determine if this is acceptable.

Async.parallelLimit can be a good way to limit server load if you need to, but first it is important to find out if a restriction is needed. Testing explicitly is the best way to find out the limits of your system (each Limit has a different signature, but can also be used).

In addition, common errors using async.parallel include the need for a more complex control flow than this function suggests (which does not seem to apply to your description) and uses parallel on a too large set naively (which, say, can cause you to click on the system file descriptor limit if you write a lot of files). With your requests of ~ 200 and saving operations on 1 GB of RAM, I would assume that everything will be fine until you do a lot of massaging in event handlers, but if you experience server hangs, a parallel limit can be a good way out.

Again, testing is the best way to understand these things.

+5
source

I would like to point out that async.parallel performs several functions at the same time not (completely) in parallel . This is more like virtual parallelism.

Execution at the same time is the launch of various programs on the same processor core using multitasking / scheduling. True parallel execution will run a different program on each core of the multi-core processor. This is important because node.js has a single-threaded architecture.

The best thing about node is that you don't have to worry about I / O. It efficiently handles I / O operations.

In your case, you are storing data in MongoDB, mostly I / O. Therefore, their simultaneous use will use your network bandwidth, as well as when reading / writing from disk, as well as in disk bandwidth. Your server will not hang due to processor overload.


The consequence of this is that if you overload your server, your requests may fail. You may receive an EMFILE error (too many open files). Each socket is considered a file. Typically, the connections are combined, which means to establish a connection that will be selected by the socket from the pool, and upon completion of the return to the pool. You can increase the file descriptor with ulimit -n xxxx .

You can also get socket errors when overloaded, for example, ECONNRESET (Error: opening socket), ECONNREFUSED or ETIMEDOUT . Therefore, treat them correctly. Also check the maximum number of concurrent connections for the mongoDB server.


Finally, the server may freeze due to garbage collection. Garbage collection starts after your memory grows to a certain point, and then periodically starts after a while. The maximum memory of the V8 heap can be around 1.5 GB, so expect the GC to work often if its memory is high. node will crash with process out of memory if requested more than this limit. Therefore, fix memory leaks in your program. You can see these tools .

+3
source

The main drawback you'll see here is the surge in database load. This may or may not be entirely normal depending on your setup.

If your database server is a shared resource, you probably want to limit concurrent queries using async.eachLimit instead.

0
source

you will understand the difference if several users are connected:

in this case, the processor can handle several operations

asynch is trying to perform several operations with multiple peers

 T = task U = user (T1.U1 = task 1 of user 1) T1.U1 => T1.U2 => T2.U1 => T8.U3 => T2.U2 => etc 

this is oposite atomization (so please look at atomization on special db operations - but that's another topic)

so maybe it's faster to use:

 T2.U1 before T1.U1 

- this is not a problem yet

 T2.U1 is based on T1.U1 

- this can be prevented by using callbacks / or, therefore, callbacks

... hope this is what you wanted to know ... its a little late here

0
source

Source: https://habr.com/ru/post/950956/


All Articles