Why does Apache Kafka Streams use RocksDB, and how can I change it?

Question

Why does Apache Kafka Streams use RocksDB, and how can I change it?

While researching new features in Apache Kafka 0.9 and 0.10, we used KStreams and KTables. There is an interesting fact that Kafka uses RocksDB internally. See Kafka Thread View: stream handling is simplified . RocksDB is not written in a JVN-compatible language, and therefore requires careful deployment processing, as it requires an additional shared library (OS dependent).

And here are some simple questions:

Why is Apache Kafka Streams using RocksDB?
How can this be changed?

I tried to find the answer, but I see only the implicit reason why RocksDB very quickly performs operations in the range of about millions of operations per second.

On the other hand, I see some databases that are encoded in Java, and maybe they can do this from end to end, since they do not use JNI.

+16

jni key-value-store in-memory-database apache-kafka-streams rocksdb java-native-interface

Seweryn Habdank-Wojewódzki Oct 18 '16 at 14:08

source share

1 answer

Matthias J. Sax · Accepted Answer · 2016-10-18T16:59:14+0000

RocksDB is used for several (internal) reasons (as you mentioned, for example, its performance). Conceptually, Kafka Streams does not need RocksDB - it is used as an internal key-value cache, and any other store offering similar functionality will also work.

Comment from @miguno below (rephrased):

RocksDB . , Kafka Streams , .

@miguno :

: "RocksDB is not written in JVN compatible language, so it needs careful handling of the deployment, as it needs extra shared library (OS dependent)." Kafka Streams .

Kafka Streams DSL, 0.10.2 (KAFKA-3825) .

Kafka Streams Processor API, StateStore .

Why does Apache Kafka Streams use RocksDB, and how can I change it?

More articles: