Remove duplicates between triggers / firing window

Let's say I have an unlimited set of sentences entered using userid, and I want to constantly update the value for the user annoying the user, we can calculate whether the user is annoying by passing all the sentences they ever said, funcion isAnnoying (). Forever.

I set the window to global with the afterElement (1) trigger, accumulating FiredPanes (), doing GroupByKey and then ParDo, which emits userid, isAnnoying

This works forever, accumulates state for each user, etc. Except that in most cases the new sentence does not change whether the user is Annoying, and therefore most of the time when the window fires and emits the user ID, the isAnnoying tuple is a redundant update, and io is not required. How can I catch these duplicate updates and discard them while still receiving the update every time a sentence arrives that changes the value of isAnnoying?

+1
source share
1 answer

Today it’s not possible to directly express “the conclusion only when the combined result has changed”.

, , : .discardingFiredPanes(), GroupByKey , , "" CombineFn. , Combine , "" .

BEAM-23 ( ParDo) , " , " .

, , , , . , triggers .

+1

Source: https://habr.com/ru/post/1648603/


All Articles