The fastest key-> value drive available with multiple values

Question

The fastest key-> value drive available with multiple values

I am looking for an efficient way to store many key-> value pairs on disk for storage, preferably with some caching.

The necessary functions are either adding to the value (concatenate) for a given key or for the model to have a key → list of values, both options are in order. Part of the value is usually a binary document.

In this case, I will not have too much use of clustering, redundancy, etc.

In the language we use java, and we have experience in classic databases (Oracle, MySQL, etc.).

I see a couple of obvious scenarios and would like to advise which is the fastest in terms of storage (and search) per second:

1) Store data in classic db tables with standard inserts.

2) Do it yourself, using the file system tree to distribute to many files, one or more per key.

3) Use some well-known tuples. Some obvious candidates: 3a) Berkeley db java 3b) Modern NoSQL solutions such as cassandra and the like

Personally, I like Berkely DB JE for my task.

Summarizing my questions:

Is Berkely a reasonable choice, given the above?
What speed can I expect for some operations, for example, updating (inserting, adding a new value for the key) and retrieving the given key?

+4

java nosql persistence berkeley-db

Bjorn j Mar 29 '11 at 19:16

source share

4 answers

leventov · Answer 1 · 2016-07-08T17:26:03+0000

You can also try the Chronicle Map or JetBrains Xodus , which are key values embedded in Java much faster than Berkeley DB JE (if you are really looking for speed). Chronicle Map provides an easy to use java.util.Map interface.

DNA · Answer 2 · 2011-05-21T21:12:55+0000

BerkeleyDB sounds reasonable. Cassandra would also be reasonable, but perhaps redundant if you do not need redundancy, clustering, etc.

However, a single Cassandra node can process 20,000 records per second (assuming you use multiple clients to use high concurrency inside Cassandra) on relatively modest hardware.

Matt ball · Answer 3 · 2011-03-29T19:18:44+0000

FWIW, I use Ehcache with fully satisfactory performance; I have never tried Berkeley DB.

dsegleau · Answer 4 · 2011-04-05T05:15:05+0000

Berkeley DB JE should work perfectly for your use case. The performance will be different, largely depending on how many inputs / outputs are required for each operation (and the consequence is how large the cache is available) and the longevity restrictions that you define for your write transactions (i.e. does the transaction have a commit? disk path or not)?

In the general case, we usually see 50-100 thousand readings per second and 5-12 thousand records per second on commercial equipment with BDB JE. Obviously YMMV.

BDB JE performance and bandwidth tuning questions are best asked on the Berkeley DB JE forum, where there is an active BDB JE application community developers on hand to help. There are several helpful performance tuning recommendations in the BDB JE FAQ that may also come in handy.

Good luck with your implementation. Please let us know if we can help.

Yours faithfully,

Dave - Product Manager for Berkeley DB

The fastest key-> value drive available with multiple values

More articles: