30,000 hydration sites are plentiful. Doctrine 2 is stable, but there are some bugs, so I'm not too surprised by the memory leak problems.
Although with smaller datasets, I had some good success using the doctrine of batch processing and creating an iterative result.
You can use the code from the examples and add gc_collect_cycles()
after each iteration. You should test it, but for me, the batch size of about 100 or so worked fine - this number gave a good balance between performance and memory usage.
It is very important that the script recognizes which objects are being processed so that it can be reloaded without any problems and resume normal operation without sending emails twice.
$batchSize = 20; $i = 0; $q = $em->createQuery('select u from MyProject\Model\User u'); $iterableResult = $q->iterate(); while (($row = $iterableResult->next()) !== false) { $entity = $row[0]; // do stuff with $entity here // mark entity as processed if (($i % $batchSize) == 0) { $em->flush(); $em->clear(); gc_collect_cycles(); } ++$i; }
Anyway, maybe you should rethink your architecture for this script a bit, since ORM is not suitable for processing large pieces of data. Maybe you can get away from working with raw SQL strings?
source share