I have some large (200 GB normal) flat data files that I would like to store in some kind of database so that I can quickly and intuitively understand that the data is logically organized. Think of it as large sets of very long audio recordings, where each record has the same length (patterns) and can be thought of as a string. One of these files usually has about 100,000 records of 2,000,000 samples in length.
It would be simple enough to save these records as BLOB data strings in a relational database, but there are many cases when I want to load only certain columns of the entire data set into the memory (say, samples 1,000-2,000), which is the most efficient way to work with memory and time?
Please feel free to ask if you need to clarify my details to make a recommendation.
EDIT: to clarify the size of the data ... One file consists of: 100,000 rows (records) per 2,000,000 columns (samples). Most of the relational databases I've researched will allow from a few hundred to several thousand rows in a table. Again, I know little about object-oriented databases, so I wonder if something like this can help here. Of course, any good solution is very welcome. Thanks.
EDIT: To clarify the use of data ... Access to the data will be performed only by the user desktop / distributed server application, which I will write. There is metadata (collection date, filters, sample rate, owner, etc.) for each data set (which I have called a 200 GB file so far). There is also metadata associated with each record (I was hoping it would be a row in the table so that I could just add columns for each piece of metadata in the record). All metadata is consistent. That is, if a certain part of the metadata exists for one record, it also exists for all records in this file. Samples themselves do not have metadata. Each sample represents 8 bits of plain-ol binary data.
source share