I'm starting to venture into distributed code, and it's hard for me to figure out which solution fits my needs based on all of this. Basically, I have a list of python data that I need to process with a single function. This function has several nested loops, but does not take too much time (about a minute) for each item in the list. My problem is that the list is very large (3000+ items). I am considering multiprocessing, but I think I want to experiment with multiserver processing (because ideally, if the data becomes more and more, I want to be able to add more servers during the job to speed things up),
I'm basically looking for something that I can distribute this list of data through (and not super, but it would be nice if I could distribute my code base also through this)
So my question is: which package can I use to achieve this? My database is hbase, so I already used hasoop (I never used hasoop, just using it for the database). I looked at celery and was distorted, but I was confused that would fit my needs.
Any suggestions?
source share