The best solution for collecting, processing, saving millions of records

Used technology stack: NodeJs MongoDB Environment: RAM: 8 GB

There are 1 million records that need to be received (request with identifiers) from a third-party provider. Identifiers are available locally. The received data is processed and then stored in mongodb.

Third-party server restriction
They can only support 100 identifiers per request.

The problem with the environment
After 100,000 records, I ran into a memory problem with the following error FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory. Thus, garbage collection was carried out at regular intervals.

After the GC, it took 15 hours to extract, process, and reset 350,000 records in mongo. (After a certain time, the write speed in mongo began to deteriorate. At the moment, the index is not saved in the reports.)

Records take up 1.5 GB for 350,000 post-processing in mongodb.

This process should be repeated once every 24 hours, so I need to achieve it in the shortest possible time. Is there a better way or any technology stack to achieve / optimize this?

+4
source share

Source: https://habr.com/ru/post/1568593/


All Articles