Read / Write / Store extremely large sets of serial data

I interact with large sequential datasets in Java. Ideally, I am looking for a library where I can store streaming data (think of sequences of immutable objects) and then navigate through the stored data later. Ultimately, the data must be stored on disk and must not be stored in memory entirely. The data will consist of states of mathematical systems - therefore, mainly numbers (doubles or even BigDecimals), as well as some lines.

This is currently for the desktop application, so there will be only one user and, possibly, several simultaneous connections at a time (multiple streams of objects / states). Later, I can consider the distributed approach and support for multiple clients in one database backend.

I looked at various NoSQL libraries, but I'm not sure which right is for my needs. Any thoughts?

+4
source share
5 answers

Take a look at OrientDB : to insert very quickly. 1,000,000 records are inserted on my laptop in 6 seconds. In addition, Java can also work as built-in to your process.

+2
source

If you have any means of calculating the offset for each object you want to access, a simple java.nio.MappedByteBuffer - the equivalent of mmap - can do the job.

+2
source

If you have a 64-bit JVM, you can write memory cards to memory. This will give you a window up to 2 GB in size per file.

If you have multiple clients, you may have a server process that has access to files or a database and caches / distributes data to clients.

+1
source

Just use a binary file? Easy if your objects are equal in size; you can use random access to go to the file. Your operating system will use its disk cache to provide you with caching for free. Sometimes people use the database and the SQL interface as a golden hammer .

0
source

Have you watched Berkeley DB Java Edition ? It was designed for this type of use. Large datasets, high write throughput, reliable persistence with a set of very Java developer APIs. You can use the Base API (key / value pairs) , the collection API, or the JPA-like DPL (Direct Persistence Layer) API.

An excellent Getting Started Guide that has examples and explains the various APIs.

There are many similar use cases at your disposal. In fact, Terracotta and Coherence use Berkeley DB for perseverance. Like Heretix, an online archive project, Tibco and many other companies and projects. The reason is that BDB provides the performance, reliability, scalability, flexibility and simplicity that they need.

Disclaimer: I am one of the product managers for Berkeley DB, so naturally I am biased. But your use case sounds exactly on target with what was developed by BDB.

Good luck with your project. Please tell us if there is anything we can help with. You can ask questions about Berkeley DB Java Edition on the OTN Forums , where you will find a large community of active Java application developers.

Yours faithfully,

Dave

0
source

Source: https://habr.com/ru/post/1336473/


All Articles