Error: TABLE_QUERY expressions cannot query BigQuery tables

This is the next question regarding Jordans here: A Strange Mistake in BigQuery

I used to query the lookup table in "Table_Query" to exit after a while. Now, after the recent changes that are taking place in Joradan, many of our requests are violated ... I would like to ask the community council for an alternative solution to what we are doing.

I have tables containing events ("MyTable_YYYYMMDD"). I want to request my details for a specific (or several) campaign. The period of this campaign is stored in a table with all the campaign data (ID, StartCampaignDate, EndCampaignDate). To query only the relevant tables, we use Table_Query (), and inside TableQuery () we create a list of all the corresponding table names based on the data from the campaigns. This request is executed in different forms many times with different parameters. the reason for using a wildcard function (rather than querying the entire dataset) is performance, execution costs, and maintenance costs. Thus, if it queries all the tables and filters only the results, this is not an option, since it reduces the execution costs too much.

An example request would look like this:

SELECT * FROM TABLE_QUERY([MyProject:MyDataSet] 'table_id IN (SELECT CONCAT("MyTable_",STRING(Year*100+Month)) TBL_NAME FROM DWH.Dim_Periods P CROSS JOIN DWH.Campaigns AS LC WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate))') 

Now this is interrupted ... My question is, the information whose tables you must request is stored in a lookup table. How would you query only the relevant tables (partitions) when "TableQuery" is no longer allowed to query lookup tables?

Thank you very much

+4
source share
2 answers

The โ€œsimpleโ€ way I see is to divide it into two steps
Step 1 - the assembly list that will be used to filter table_id

 SELECT GROUP_CONCAT_UNQUOTED( CONCAT('"',"MyTable_",STRING(Year*100+Month),'"') ) TBL_NAME_LIST FROM DWH.Dim_Periods P CROSS JOIN DWH.Campaigns AS LC WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate) 

Pay attention to the change in your query to convert the result to a list, which you will use in step 2

Step 2 - Final Request

 SELECT * FROM TABLE_QUERY([MyProject:MyDataSet], 'table_id IN (<paste list (TBL_NAME_LIST) built in first query>)') 

The above steps are easy to implement on any client that you potentially use. If you use it from the BigQuery Web UI, it forces you to make a few additional manual โ€œmovesโ€ that may not suit you.

My answer is obvious, and you most likely already did this as an option, but wanted to mention

+2
source

This is not an ideal solution. But it looks like he is doing this job.

In my previous request, I passed the list of IDs as a parameter in the external process that built the request. I wanted this process to not be aware of any logic implemented in the request.

In the end, we came up with this solution:

Instead of passing a list of identifiers, we pass a JSON that contains the corresponding metadata for each identifier. We parse this JSON in the Table_Query () function. Therefore, instead of requesting a physical lookup table, we request some kind of "table variable" that we entered in JSON.
The following is an example of a query that runs in a public dataset that demonstrates this solution.

  SELECT YEAR, COUNT (*) CNT FROM TABLE_QUERY([fh-bigquery:weather_gsod], 'table_id in (Select table_id From (Select table_id,concat(Right(table_id,4),"0101") as TBL_Date from [fh-bigquery:weather_gsod.__TABLES_SUMMARY__] where table_id Contains "gsod" )TBLs CROSS JOIN (select Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"fromDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as fromDate, Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"toDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as toDate, FROM (Select "[ { \"CycleID\":\"123456\", \"fromDate\":\"1929-01-01\", \"toDate\":\"1950-01-10\" },{ \"CycleID\":\"123456\", \"fromDate\":\"1970-02-01\", \"toDate\":\"2000-02-10\" } ]" as DatesInput)) RefDates WHERE TBLs.TBL_Date>=RefDates.fromDate AND TBLs.TBL_Date<=RefDates.toDate )') GROUP BY YEAR ORDER BY YEAR 

This solution is not ideal because it requires the external process to know the data stored in lookup tables. Ideally, the BigQuery team will again include this very useful feature.

0
source

Source: https://habr.com/ru/post/1247186/


All Articles