Initially, I only had to deal with 1.5 [TB] of data. Since I just needed a quick write / read (without any SQL), I developed my own flat binary file format (implemented using python ) and easily (happily) saved my data and manipulated it on one machine. Of course, for backup, I added 2 machines that will be used as exact mirrors (using rsync ).
My needs are currently growing, and there is a need to create a solution that will successfully scale to 20 [TB] (and even more) data. I would be happy to continue using the file format for storage . It is fast, reliable and gives me everything I need.
What bothers me is replication, data consistency, etc. (obviously, the data must be distributed, but not all data can be stored on one machine ) over the network.
Are there any ready-made solutions ( Linux / python based ) that will allow me to use my file format for storage, but will handle the other components that NoSql solutions usually provide? (consistency / data availability / simple replication)?
basically, all i want is to make sure my binaries are consistent on my network. I use a network of 60 core duo machines (each with 1GB RAM and 1.5TB disk )
source share