Is Kafka Stream StateStore global in all cases or only locally?

The Kafka Stream WordCount uses a StateStore to store word counts. If there are multiple instances in the same consumer group, is the StateStore global for the group or just local for the user instance?

Thnaks

+8
source share
3 answers

It depends on your view in the state store.

  • In Kafka threads, the state is divided, and therefore each instance contains a part of the general state of the application. For example, when using the stateful DSL statement, a local instance of RocksDB is used to store its state. Thus, in this respect, the state is local.

  • On the other hand, all state changes are recorded in the Kafka theme. This section does not "live" on the application host, but in the Kafka cluster and consists of several sections and can be replicated. In the event of an error, this change topic is used to recreate the state of the failed instance in another instance that is not already running. Thus, since the change log is accessible to all instances of the application, it can also be considered global.

Keep in mind that the change log is the truth of the state of the application, and local storage is basically the cache of state fragments.

In addition, in the WordCount example, the record stream (data stream) is divided into words, so that counting of one word will be supported by one instance (and different instances support counting for different words).

For an architectural overview, I recommend http://docs.confluent.io/current/streams/architecture.html

Also this blog should be interesting http://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

+18
source

If it is worth mentioning that there is a proposal to improve GlobalKTable

GlobalKTable will be fully replicated once per instance of KafkaStreams. That is, each instance of KafkaStreams will consume all sections of the corresponding topic.

From the Confluent Platform mailing list, I have this information

You can start prototyping using the Kafka 0.10.2 branch (or torso) ...

0.10.2-rc0 already has GlobalKTable!

Here is the actual PR .

And the man who told me it was Matthias J. Sachs;)

+3
source

Use a Processor instead of a Transformer for all the transformations that you want to perform on the input topic, whenever there is a data search script from GlobalStateStore. Use context.forward(key,value,childName) to send data to descending nodes. context.forward(key,value,childName) can be called multiple times in process() and punctuate() to send multiple records to a downstream node. If you need to update GlobalStateStore, do it only in the Processor passed to addGlobalStore(..) because there is a GlobalStreamThread associated with GlobalStateStore that maintains consistent storage state in all running kstream instances.

0
source

Source: https://habr.com/ru/post/1258863/


All Articles