Why does Apache Kafka Streams use RocksDB, and how can I change it?

While researching new features in Apache Kafka 0.9 and 0.10, we used KStreams and KTables. There is an interesting fact that Kafka uses RocksDB internally. See Kafka Thread View: stream handling is simplified . RocksDB is not written in a JVN-compatible language, and therefore requires careful deployment processing, as it requires an additional shared library (OS dependent).

And here are some simple questions:

  • Why is Apache Kafka Streams using RocksDB?
  • How can this be changed?

I tried to find the answer, but I see only the implicit reason why RocksDB very quickly performs operations in the range of about millions of operations per second.

On the other hand, I see some databases that are encoded in Java, and maybe they can do this from end to end, since they do not use JNI.

+16
source share
1 answer

RocksDB is used for several (internal) reasons (as you mentioned, for example, its performance). Conceptually, Kafka Streams does not need RocksDB - it is used as an internal key-value cache, and any other store offering similar functionality will also work.

Comment from @miguno below (rephrased):

RocksDB . , Kafka Streams , .

@miguno :

: "RocksDB is not written in JVN compatible language, so it needs careful handling of the deployment, as it needs extra shared library (OS dependent)." Kafka Streams .

Kafka Streams DSL, 0.10.2 (KAFKA-3825) .

Kafka Streams Processor API, StateStore .

+20

Source: https://habr.com/ru/post/1658104/


All Articles