Large Background Python Jobs

I am running a Flask server that loads data into a MongoDB database. Since there is a lot of data and it takes a lot of time, I want to do this with a background job.

I use Redis as a message broker and Python-rq to implement job queues. All code runs on Heroku.

As I understand it, python-rq uses pickle to serialize the function being executed, including parameters, and adds it along with other values ​​to the Redis hash value.

Since the parameters contain information that will be stored in the database, it is quite large (~ 50 MB), and when it is serialized and stored in Redis, it not only takes a noticeable amount of time, but also consumes a large amount of memory. Heroku Redis plans cost only $ 30 p / m for 100MB. In fact, I often get OOM errors, for example:

OOM command not allowed when used memory > 'maxmemory'.

I have two questions:

  • Is python-rq well suited for this task, or maybe JSON celery serialization may be more appropriate?
  • Is there a way not to serialize the parameter, but rather a link to it?

Your thoughts on the best solution are greatly appreciated!

+4
source share
2 answers

, , :

  • / .
  • Amazon S3.
  • URL- RQ.
  • .
  • , Mongo.

, :

  • .
  • S3 ( gzip).
  • redis, .
  • S3 ( S3: 1 , ).
  • , .

, , , , .

, !

+2

, , , Amazon S3, URI .

0

Source: https://habr.com/ru/post/1665718/


All Articles