Multithreaded NodeJS processing - child processes in the pool (or multithreading)

NodeJS batch processing - child processes in the pool.

I know that a child process is a process, not a thread. I used the wrong semantics because most people know what your intentions are when you talk about "multithreading." Therefore, I will save it in the title.


Imagine a scenario in which you constantly use several similar and complex things, using one custom function or module. It makes sense to use all available kernels / threads (e.g. 8/16), for which purpose child_process.fork() intended.

Ideally, you will need several simultaneous workers and send / return messages to / from one controller.

node-cpool , fork-pool , child-pool are some of the modules that do just that, but they seem old / unsupported / impulative.

There are a ton of such modules, but they seem to be the most relevant. What all of them have is a couple of moments that almost played a major role, barely forked and abandoned.

What usually happens when I cannot find something for a task that seems to be something that makes sense in all respects is that there is an even better way that I am missing. Hence my question.

How do I create a parallel pool for a managed, multi-readed parallel fork() for my custom module that works with some processors?

Multi-threaded modules, such as TAGG and webworker-threads , are not the same because they do not support full modules (with binary compiled components).


PS

Now I am using a fork-pool , which seems to do exactly what I want with some quirks, but I cannot believe that such an unknown and weak module will be the only viable option here.

+5
source share
4 answers

I would like to provide an option that does not exactly answer your question, but can be useful in a situation like yours when there is flexibility in choosing technologies.

If the work of unloading into a .NET environment (C #, F #, IronPython, PowerShell, etc.) is acceptable, you might be interested in Edge.js.

Thus, you can use Node to work intensively with IO and delegate computationally intensive work to the .NET runtime , which is located inside one process . Edge.js provides effective interoperability using .NET code, allowing you to use the parallel .NET Task library and other features without the overhead of creating additional processes.


When creating hybrid applications, there are maintenance costs and technical costs. Carefully evaluate what you get and make sure that it is worth the cost in accordance with your project priorities.

Node.js is not suitable for blocking CPU-related workloads. A distinctive feature of node.js design is its single-threaded event loop architecture.

Node.js typically handle CPU-related workloads by delegating processing to an external process or service. This is due to the process border crossing and additional latency. ( more details )

It is foolish to introduce complexity without a good reason. If node.js can handle the task itself, adding a .NET dependency is likely to be redundant. However, there are many tasks when it can be valuable . Being a good engineer requires some thinking.

+2
source

I would suggest using something like Redis as your lineup. Here is a tutorial on creating a message bus in Node with Redis and Kue. This will scale very well and allow you to have multiple processes, threads, or even machines producing and consuming items in / out of the queue.

+1
source

The Web Workers standard defines the way JavaScript uses multiple threads and works a lot in parallel, which a single thread can control.

There are several implementations of this for NodeJS, including the NPM webworker-threads module.

Using fork follows a path with several processes, which is usually much more difficult to negotiate. The NodeJS Cluster system is trying to alleviate a lot of friction here, but far from ideal.

+1
source

I recently ran into this problem with combining forks created using a single node.js process, and developed my own solution to solve the problem. Finally, I managed to export the solution to my own npm module, which you can check here:

https://www.npmjs.com/package/forkpool

You can create a single pool to manage all your forks or create several pools to manage individual batches of work. For example, one of my applications has two pools: one for managing forks related to image processing, and the other for video processing. Since video processing is more intensive than image processing, the size of the video processing pool is 2, and the one with image processing is 4 in size on an 8-core computer.

I hope to constantly improve this module over time, so feel free to raise issues or upgrade requests in the Github repository:

https://github.com/manthanhd/forkpool

0
source

Source: https://habr.com/ru/post/1202871/


All Articles