I will soon be working on a project that is creating a problem for me.
This will require regular intervals throughout the day to process tens of thousands of records, possibly more than a million. Processing will include several (potentially complex) formulas and the creation of several random factors, writing some new data into a separate table, and updating the source records with some results. This is necessary for all entries, ideally every three hours. Each new user on the site will add from 50 to 500 entries, which should be processed in this way, so the number will not be sustainable.
The code has not been written yet, as I am still in the development process, mainly because of this problem. I know that I will need to use cron jobs, but I am worried that processing records of this size may cause the site to freeze, execute slowly, or simply drop my hosting company every three hours.
I would like to know if anyone has any experience or advice on such topics? I have never worked with such a value before, and as far as I know, this will be trivial for the server and will not present a big problem. While ALL records are processed before the next three-hour period, I donβt care if they are not processed at the same time (although, ideally, all records belonging to a specific user should be processed in one batch), so I "wondered if I should process batches every 5 minutes, 15 minutes, an hour, no matter what works, and what is the best way to approach this (and make it scalable so that it is true for all users)?
source share