Can Cron jobs be used for simultaneous multithreading with PHP?

I have a mysql database table populated with 1000+ records, say 5000 records. Each record has a processed boolean flag, by default false (0) . I would like to have a PHP script run on cron every minute. Its code would be something like this:

 <?php process(); function process() { $sql = "SELECT id FROM items WHERE processed = '0' ORDER BY id ASC LIMIT 1"; $result = $this->db->query($sql); if (! $result->has_rows()) die; $id = $result->getSingle('id'); processItem($id); //Will set processed to 1 after processing is done process(); } ?> 

It should be clear what this code does, it receives an identifier for the next record that is not being processed, processes it, and then calls the process() function, which repeats this process until there are no more elements to process, after which execution stops.

By putting this script on Cron to run every minute, I hope you have several instances of this script that work simultaneously with processing elements, instead of processing one element at a time, 5-10 + elements can be processed simultaneously.

1) Will it work the way I plan? Any suggestions for improvement / things to look out for?

2) Should I have a script set the counter for the number of running instances, so whenever the cron job starts, it checks the counter, if 50 (?) Instances are executed, it will exit without processing. Can this lead to a server crash due to too many running processes that use too much memory? Any thoughts?

+4
source share
3 answers

I have a few things to say:

First, you use recursion to process multiple rows. This can lead to problems if you become pregnant too deeply. Use a simple loop instead.

Secondly, did you know if this code can work several times? If a machine is connected to a CPU, it may not benefit from another thread. I suggest you manually check how many threads work best. More threads do not always speed up work, and in some cases it can slow everything down.

Finally, I would definitely put a limit on how many of these scripts can work at the same time. This can be achieved simply by the fact that each script runs for no more than 5 minutes. Or you can save the number of active scripts and make sure that it does not exceed the maximum number that you defined in the second sentence.

Edit: I added additional information about the recursion of the problem: Every time you call a function recursively, additional space is used on the stack. This space stores any local variables, as well as the address of the function (allowing it to restore state when the called function exits). The stack has only a finite amount of space, so, in the end, your program will crash with a stack overflow. Try running this simple program:

 function a($i) { print $i . "\n"; a($i + 1); } a(0); 

On my system, it aborts PHP after 608739 iterations. This number can be much smaller in a more complex function. A simple cycle does not have this overhead, so it does not have this problem.

+7
source

Recursion does not seem necessary at all, and, as Brump said, can lead to problems. Why not just

 $sql = "SELECT id FROM items WHERE processed = '0' ORDER BY id ASC LIMIT 1"; while ( ($result = $this->db->query($sql) && $result->has_rows() ) { processItem( $result->getSingle('id') ); } 

However , I foresee big problems here. If you run this script every minute, what mechanism do you have to stop the execution of previously executed scripts that can still work? You can process the same ID more than once.

If you absolutely need a multi-threaded approach (pseudo), I suggest the following:

  • Grab a range or all raw identifiers, not just one at a time.
  • Using the curl_multi_ family of functions, pass the subsets of the above results (groups n id) to another script to do the actual processing.

This method allows you to have more control over the whole process and prevents unnecessary uniprocessor receipt of raw identifiers.

+1
source

I started a project to solve the same problem. It can constantly run a script and run more instances in parallel if there is high demand. If there is nothing to do, it will wait for the specified interval before running the instance of the script.

If you're interested, read a few usage examples: www.4pmp.com/fatcontroller/

0
source

Source: https://habr.com/ru/post/1301292/


All Articles