Delayed work is great, I recommend it heartily. Add the HireFire gem to make it even better - this gem automatically increases the number of work processes when job shortages accumulate, and closes workers when there are no tasks. However, if you are using HireFire, do not plan to complete the tasks in the future - just put them in the queue when you want them to run, possibly inside the rake task performed by the Heroku Cron addon. (HireFire will not start workflows correctly if you try to schedule tasks for the future.)
You can configure the maximum number of employees that HireFire will use, and how it adds workers, as the backlog of jobs is growing. This makes it very easy to scale. You will need to select the appropriate "grain size" for your cleaning / parsing work (how many 100 or 1000 users should be processed in one task). Then, in your Cron task, divide all users into groups of the appropriate size, queue the background job for each group, and let HireFire launch the appropriate number of workflows to quickly complete all tasks.
This still leaves the problem of minimizing the cost of dynamic time. I recently ran into the same issue on the Rails site that I created ...
The site retrieves data from various web services using delayed_job background workers. I got a performance increase of almost 10 times for this data transfer task by running several HTTP requests in parallel using the parallel map conversion utility that I built myself.
I intend to do another work on this implementation with a map reduction, but if you want to use it now, you can: https://github.com/alexdowad/showcase/blob/master/ruby-threads/threads.rb
The higher your latency / processing time ratio, the more you will win. Let me know if you want a sample background job code that uses this utility.
source share