Heroku Packet Network Night Task

We are working on a Rails project on Heroku, which is supposed to clean and process data every night for each user. This requires a lot of Internet access for each user, and we hope to support tens of thousands of users. While there is a fair bit of parsing, computing, and writing to the involved databases, we expect that most of the task time will be spent waiting for data from the network.

What is the best overall approach to accomplishing this task while minimizing both wall-mounting time and Heroku fees? Obviously, you will need either concurrency or an asynchronous network to take advantage of network latency, but how do we do this? We think in terms of supporting a database with forked workflows, but this may not be the best approach or may not even be possible on Heroku.

+4
source share
2 answers

Heroku supports the Delayed Job , I would start there. Then you can do the following:

  • create a job class that performs processing for one user
  • a nightly cron schedule that creates a task for each user on your system.
  • Automatically scale your workers to accommodate the job queue ( workless or similar should be able to do this for you. If not, you may have to roll up some custom code.)

You will need to play with your work / work ratio to find out which place is better for optimizing the db load, wall time and heroin time.

If you find that each job spends too much time waiting on the network, see eventmachine . Jobs is just a ruby ​​code, so you can play any tricks you want here, Heroku should not limit you in any way.

This setting would be a good base to get to it, since it should not take a lot of time, and you will probably learn a little about your workload.

You may find that 1 job / user does not make sense and that you need n tasks for each user (one task for one property or something else). Not knowing your exact usecase, it's hard to say why I accept a 1-1 mapping.

I should also point out that the new Heroku stack supports queuing systems other than the Delayed job (scroll down).

+7
source

Delayed work is great, I recommend it heartily. Add the HireFire gem to make it even better - this gem automatically increases the number of work processes when job shortages accumulate, and closes workers when there are no tasks. However, if you are using HireFire, do not plan to complete the tasks in the future - just put them in the queue when you want them to run, possibly inside the rake task performed by the Heroku Cron addon. (HireFire will not start workflows correctly if you try to schedule tasks for the future.)

You can configure the maximum number of employees that HireFire will use, and how it adds workers, as the backlog of jobs is growing. This makes it very easy to scale. You will need to select the appropriate "grain size" for your cleaning / parsing work (how many 100 or 1000 users should be processed in one task). Then, in your Cron task, divide all users into groups of the appropriate size, queue the background job for each group, and let HireFire launch the appropriate number of workflows to quickly complete all tasks.

This still leaves the problem of minimizing the cost of dynamic time. I recently ran into the same issue on the Rails site that I created ...

The site retrieves data from various web services using delayed_job background workers. I got a performance increase of almost 10 times for this data transfer task by running several HTTP requests in parallel using the parallel map conversion utility that I built myself.

I intend to do another work on this implementation with a map reduction, but if you want to use it now, you can: https://github.com/alexdowad/showcase/blob/master/ruby-threads/threads.rb

The higher your latency / processing time ratio, the more you will win. Let me know if you want a sample background job code that uses this utility.

+2
source

Source: https://habr.com/ru/post/1395377/


All Articles