Work with changing schemes

We are a gaming company that stores events (up to 1 gig per day) in bigquery. Events overlap throughout the month and applications to reduce request costs.

Now to our problem.

Our current solution supports the addition of new event types, which leads to new versions of the table schema. These versions are also added to the tables.

those. events_app1_v2_201308 and events_app1_v2_201308

If we add events with new column types in September, we also get events_app1_v3_201309

We wrote code that detects the involved tables (for a date range) and combines them into an aqua comma separeted FROM.

But I just realized that this will NOT work when we create unions in different versions of the event tables.

Anyone who has a smart decision how to handle this !?

We are now investigating whether JSON structures can help us. The current solution is just flat columns. [timestamp, eventId, value, value, value, ...]

From https://developers.google.com/bigquery/query-reference#from

Note. Unlike many other SQL-based systems, BigQuery uses comma syntax to indicate table joins, not joins. Does this mean that you can run a query across multiple tables with compatible schemas ?? in the following way:

+4
source share
1 answer

You should be able to modify the table table of old tables to add columns, then the join should match. Please note that you can only add columns, not delete them. You can use the tables.patch () method for this or bq update --schema

In addition, until new fields are marked as REQUIRED, they should be considered compatible. If this is not the case, it will be a mistake - let us know if this is what you are experiencing.

+5
source

Source: https://habr.com/ru/post/1500468/


All Articles