At present, our task is to collect 1 million records from an external server, process it and save it in db. We use node.js to retrieve records and mongodb as a database.
We decided to divide the process into 2 tasks, extracting the records and processing them. Now we can extract all the records and upload them to the mongo, but when we try to process it (when processing, I mean changing several attribute values, simple calculations and updating attributes), we see a sharply slow response in mongodb being updated around 200,000 records.
To process the data, we take batches of 1000 records, process them, update the records (individually), and then go to the next batch. How can productivity be improved?
source share