Google BigQuery asks to join EACH, but I already use it

I am trying to run a query in BigQuery, which has two picks and a join, but I cannot get it to work. What I am doing as a workaround is to execute the subqueries myself and then save them as tables and then make another query with the join, but I think I should do it with one query.

I get an error message:

Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins

but I am already using union. I tried using cross-connect and using each group, but this gives me different errors. Other stack overflow questions on this topic do not help, they say that it was a mistake in BigQuery, and the other is someone using "cross join each" ...

Below is my sql, forgive me if it is full of errors, but I think it should work:

 select t1.device_uuid, t1.session_uuid, t1.nth, t1.Diamonds_Launch, t2.Diamonds_Close from ( select device_uuid, session_uuid, nth, sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch from [project_id].[table_id] where name = 'App Launch' and attributes.Name = 'Inventory - Diamonds' group by device_uuid, session_uuid, nth ) as t1 join each ( select device_uuid, session_uuid, nth, sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close from [project_id].[table_id] where name = 'App Close' and attributes.Name = 'Inventory - Diamonds' group by device_uuid, session_uuid, nth ) as t2 on t1.device_uuid = t2.device_uuid and t1.session_uuid = t2.session_uuid 
+6
source share
2 answers

You have GROUP BY inside JOIN EACH . GROUP BY reaches limits with power (the number of different values), and the final grouping is not parallelizable. This limits the ability of BigQuery to perform the connection.

If you change GROUP BY to GROUP EACH BY , this will most likely work.

(Yes, I understand that this is unpleasant and non-standard. The BigQuery team is currently working on doing things like this "just work.")

+6
source

This can be combined with one request:

 SELECT device_uuid, session_uuid, nth, SUM(IF (name = 'App Launch', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Launch, SUM(IF (name = 'App Close', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Close, FROM [project_id].[table_id] WHERE attributes.Name = 'Inventory - Diamonds' GROUP BY device_uuid, session_uuid, nth 

You should also use GROUP EACH for large tables.

+3
source

Source: https://habr.com/ru/post/982960/


All Articles