NoSql with my own binaries?

Question

NoSql with my own binaries?

Initially, I only had to deal with 1.5 [TB] of data. Since I just needed a quick write / read (without any SQL), I developed my own flat binary file format (implemented using python ) and easily (happily) saved my data and manipulated it on one machine. Of course, for backup, I added 2 machines that will be used as exact mirrors (using rsync ).

My needs are currently growing, and there is a need to create a solution that will successfully scale to 20 [TB] (and even more) data. I would be happy to continue using the file format for storage . It is fast, reliable and gives me everything I need.

What bothers me is replication, data consistency, etc. (obviously, the data must be distributed, but not all data can be stored on one machine ) over the network.

Are there any ready-made solutions ( Linux / python based ) that will allow me to use my file format for storage, but will handle the other components that NoSql solutions usually provide? (consistency / data availability / simple replication)?

basically, all i want is to make sure my binaries are consistent on my network. I use a network of 60 core duo machines (each with 1GB RAM and 1.5TB disk )

+4

python linux distributed

user3262424 Apr 6 '11 at 1:26

source share

2 answers

Perhaps some comment on the Kivaloo system developed for Tarsnap will help you decide which is most suitable: http://www.daemonology.net/blog/2011-03-28-kivaloo-data-store.html

Without knowing more about your application (record size / type, read / write frequency) or user-defined format, it's hard to say more.

+1

entropo Apr 6 '11 at 1:39

source share

Manuel salvadores · Accepted Answer · 2011-04-06T11:17:07+0000

Approach: Distributing a Distributed Map in Python with a Disco Project

This seems to be a good way to get closer to your problem. I used a disco project with similar problems.

You can distribute your files among n numbers of machines (processes), as well as implement a map and reduce functions that correspond to your logic.

A disco project tutorial describes exactly how to implement a solution for your problems. You will be impressed by how little code you need to write, and you can definitely save the format of your binary.

Another similar option is to use Amazon Elastic MapReduce

NoSql with my own binaries?

More articles: