Another idea might be to use Hadoop for your backend. It bears a resemblance to the previously mentioned CouchDB , but focuses more on efficiently handling large datasets using MapReduce .
Compared to CouchDB, Hadoop is not suitable for real-time applications or as a database behind a website, because it has high latency for access to one record, but it really shines when repeating all elements and calculations, even Peta bytes of data.
So maybe you should give Hadoop a try. Of course, it may take some time to get used to these MapReduce algorithms, but they are really great ways to describe such problems. And you do not need to deal with storing intermediate results yourself. And a good side effect is that your algorithm will work when your dataset gets larger, but then you may have to add another server. :-)
There are also many books and documentation on the available Hadoop and MapReduce, but here is a good tutorial that can help you get started with Hadoop and Python.
source share