I have a long (5-10 hours) Mac app that processes 5000 elements. Each element is processed through a series of transformations (using Saxon), runs a set of scripts (in Python and Racket), collects data and serializes it as a set of XML files, SQLite database and CoreData database. Each element is completely independent of every other element.
In general, he does a lot, takes a lot of time, and seems to have a high degree of parallelism.
After loading all the elements that need to be processed, the application uses GCD to parallelize the work using dispatch_apply :
dispatch_apply(numberOfItems, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^(size_t i) { @autoreleasepool { ... } });
I am running the application on a Mac Pro with 12 cores (24 virtual). Therefore, I would expect 24 items to be processed at any time. However, I found that the number of items being processed varies from 8 to 24. This literally adds hours to the runtime (assuming it can run 24 items at a time).
On the one hand, perhaps the GCD is really, really smart, and it already gives me maximum bandwidth. But I am worried that, since most of the work happens in the scripts created by this application, perhaps the GCD reasoning from incomplete information and not making the best decisions.
Any ideas how to increase productivity? Once correct, the required attribute number one reduces the execution time of this application. I don't care about power consumption, intimidation of the Mac Pro or anything else.
UPDATE: Actually, it looks alarming in docs : “The actual number of tasks performed by the parallel queue at any given time is variable and can dynamically change as your application conditions change. Many factors affect the number of tasks performed by parallel queues, including the number of available cores, the amount of work performed by other processes , and the number and priority of tasks in other sequential dispatch queues. " (emphasis added) It seems that other processes performing the work will adversely affect the planning in the application.
It would be nice to be able to say "run these blocks at the same time, one per core, not try to do something smarter."