How does BigQuery caching work during partitioned tables?

Unlike the BigQuery documentation, we see that it DOES caches the results when selecting data from a flow table with data breakdown ( Standard SQL ).

Example: When we do deterministic date checking in streaming, the data partition table uses:

where (_PARTITIONTIME > '2017-11-12' or _PARTITIONTIME is null)

... BigQuery caches data for 5-20 minutes if we run the same exact query during this time interval.

While my interpretation of the documentation says that SHOULD NOT cache data:

'When any of the tables referenced by the query recently received stream inserts (a stream buffer is attached to the table), even if new rows did not arrive'

Important notes:

  • Our test request requests events that really come to us constantly.
  • We really want this caching behavior because we don't always need the data to be up to the last second. We just want to know if we can really depend on this behavior.

Our questions:

  • What is happening here / Why is BQ caching happening?

  • The time that this data remains in the BQ cache is "random" (between 5-20 minutes). What does it mean?

+5
source share
1 answer

Thanks for clarifying the issue. I think that it should be overlooked that we did not disable caching for partitioned tables with streaming data. As otherwise, the query may return stale results.

We will invalidate the cache when the table changes. Flow to the table will cause the table to change. I think why the cache is invalid between 5 and 20 minutes.

+3
source

Source: https://habr.com/ru/post/1273324/


All Articles