What are the advantages and disadvantages of streaming data in Bigquery and loading data into PubSub, and then using a data stream to insert data into Bigquery

As far as I know, streaming data in BigQuery will lead to duplicate rows, since it mentions https://cloud.google.com/bigquery/streaming-data-into-bigquery#real-time_dashboards_and_queries

On the other hand, loading data into PubSub and then using a data stream to insert data into Bigquery will prevent row duplication ?. there is also a tutorial for real-time data analysis here https://cloud.google.com/solutions/real-time/fluentd-bigquery

so there are other pros and cons, and in which case should I use a data stream to stream data from PubSub

+4
source share
1 answer

With Google Dataflow and PubSub, you will have full control over your streaming data, you can slice and slice your data in real time and implement your own business logic and finally write it to the BigQuery table. On the other hand, using other approaches to direct data flow in BigQuery using BigQuery jobs, you definitely lose control of your data.

, . , , , group by key, merge, partition, sum , , , Dataflow. , , PubSub Dataflow , .

, , , Dataflow. Dataflow , , . Dataflow , , PubSub. Dataflow , . , .

+5

Source: https://habr.com/ru/post/1676165/


All Articles