I study the problem of data correctness in a regularly running task that I wrote, and the problem seems to be caused by BigQuery rewriting the same table twice in a non-atomic way. More specifically, I had two instances of the same query launched simultaneously (due to retry logic), both were configured to rewrite the same table (using the WRITE_TRUNCATE parameter), and the resulting table had two copies of each row. I expected that one query would write a table with the results of the query, and another query would overwrite it with the same results, and not end with a table with two sizes.
My understanding when designing the system was that all BigQuery actions are atomic (based on the atomic inserts in the big query , Can I safely request that the BigQuery table be replaced by WRITE_TRUNCATE and the Views do not work when their base table is populated ). Is the problem that I am confused by the error, or I do not understand the exact guarantees that I can expect?
Looking through the story, it looks like this happened in at least 4 separate cases last week.
Here is a graph of what makes this happen (with specific details regarding the most notable case):
- At about 18:07 on April 30, UTC, my code sent 82 requests at the same time. Each of them requested a table ending in conversions_2014_04_30_14 and another table, and wrote in a table ending in convertions_2014_04_30_16 (with WRITE_TRUNCATE).
- After about 25 minutes, 25 requests have not yet been completed (which is more than usual), so he called the “repeat” logic, which rejects all requests that are still running and just sends them again (this is a problem for me to work, which I I saw when requests remain pending for several hours without launching, which I mentioned here: https://code.google.com/p/google-bigquery/issues/detail?id=83&can=1 ). This means that immediately 50 requests were outstanding, two of 25 requests that were not yet completed.
- 6 82 , .
:
: 124072386181: job_tzqbfxfLmZv_QMYL6ozlQpWlG5U
: 124072386181: job_j9_7uJEjtvYbyeVmEVP0u2er9Lk
: 124072386181: bigbingo_history.video_task_companions_conversions_2014_04_30_16
:
: 124072386181: job_TQJzGabFT9FtHI05ftTkD5O8KKU
: 124072386181: job_5hogbjnLX_5a2opEJl9Jacnn53s
: 124072386181: bigbingo_history.Item_repetition__Elimination_conversions_2014_04_27_16
, ( ), . - , "GROUP BY alternative, bingo_id", (, bingo_id).