I use a celery worker to get results from machine learning models.
What I am doing is sending large numpy arrays (several megabytes) from the client task to celery and vice versa.
I am currently serializing in client numpy arrays as base64. When I store / receive data directly from / to Redis on a client or celery worker, system performance is much faster than / when I allow celery to pass arguments (base64 numpy).
I would like to use celery (with the redis browser) also to transfer args / numpy arrays and not redraw directly in the client. Do you know where the problems may be? How can I configure the celery configuration to do this more efficiently (transfer data between the client-> broker-> worker and back to the client).
serialized = np.asarray(images).reshape((number_of_records, size)).ravel().tostring()
serialized = base64.b64encode(serialized)
print('calling celery processor')
result = self.celery.send_task('process', args=[number_of_records, serialized], kwargs={})
returncode, result = result.get(timeout=1000, interval=0.1)
vs (this is faster, direct use of redis storage):
serialized = np.asarray(images).reshape((number_of_records, size)).ravel().tostring()
serialized = base64.b64encode(serialized)
self.redis.set(key, serialized)
print('calling celery processor')
result = self.celery.send_task('process', args=[number_of_records, key], kwargs={})
returncode, result = result.get(timeout=1000, interval=0.1)
resultc= self.redis.get(key)
Any tips on celery performance for serializing, configuring, ...? I would like this system to be fast and simple. Should I really use redis directly since it is in the second example?