Sinkless Streaming Stream

We have a streaming data stream pipeline running on Google Cloud Dataflow workers who need to read from a PubSub subscription, message groups and write them to BigQuery. The built-in BigQuery sink does not meet our needs, because we need to focus on specific data sets and tables for each group. Since custom sinks are not supported for piping, it seems like the only solution is to paste operations into ParDo. Something like that:

enter image description here

Is there any known problem with the lack of a receiver in the pipeline, or something you need to know when writing this kind of pipeline?

+4
source share
1 answer

. , ParDo .

ParDo API BigQuery . BigQuerySink, .

DoFn, StreamingWriteFn, ParDo, BigQuery/table.

, Reshuffle GroupByKey. Reshuffle, , . , , , /. , BQ .

: BigQuerySink . - API BigQuery DoFn, BigQuerySink

+3

Source: https://habr.com/ru/post/1667683/


All Articles