Select only one row for each "ID" in a nested repeating pattern in BigQuery

I have a table in BigQuery with an ID field and a re-write field, along with some other fields, such as data collection time.

Each table has several rows, and I want to somehow select / combine one row for each identifier. Almost all solutions, such as selecting the latter, selecting the first and aggregating rows with a duplicate identifier on the same row, are acceptable in my use case, but I could not get them to work.

To be more precise, my table has an identifier field, which in BigQuery terms: {name: ID, type: STRING, mode: NULLABLE}and a repeated field: {name: data, type: RECORD, mode: REPEATED}along with some other (equal) fields. In my table for each identifier there are several rows, each of which has a repeating field datafor itself. In my query result, I want to have a table with the same scheme in which each identifier appears only once, and the corresponding field datais a field datafrom one of the identifier entries in the original table. (or ideally combine with all his meetings)

Here is a list of solutions that do not work here:

First : use

row_number() OVER (PARTITION BY ID ORDER BY collection_time) as rn ... where  rn=1

Reason: BigQuery aligns the results when used partition by, even if the Unflatten Results parameter is used.

: :

: - .

: group by ID nest/first .

: nest . SELECT ID, nest(data.a), nest(data.b) from:

ID     data.a      data.b
--------------------------
1      1a1          null
       1a2          1b2
--------------------------
1      2a1          2b1
       null         2b2

ID      data.a       data.b
----------------------------
1        1a1         1b2
         1a2         2b1
         2a1         2b2
+4
2

, SQL ( " SQL" " " ). GROUP BY ARRAY_CONCAT_AGG, :

SELECT id, ARRAY_CONCAT_AGG(data) AS data
FROM MyTable
GROUP BY id;
+1

SQL

SELECT id, ARRAY_AGG(STRUCT(a, b)) AS data
FROM (
  SELECT id, a, ROW_NUMBER() OVER() AS num 
  FROM YourTable, UNNEST(data) WHERE NOT a IS NULL 
) FULL OUTER JOIN (
  SELECT id, b, ROW_NUMBER() OVER() AS num 
  FROM YourTable, UNNEST(data) WHERE NOT b IS NULL 
)  
USING(id, num) 
GROUP BY id  

exactly, ( NULL):

ID      data.a       data.b
----------------------------
1        1a1         1b2
         1a2         2b1
         2a1         2b2

( ) a/b - ( Standard SQL)

SELECT id, ARRAY_CONCAT_AGG(data) AS data
FROM YourTable
GROUP BY id

ID      data.a       data.b
----------------------------
1        1a1         null
         1a2         1b2
         2a1         2b1
         null        2b2

, ( YourTable → `project.dataset.table`)

WITH YourTable AS (
  SELECT 1 AS id, ARRAY<STRUCT<a STRING, b STRING>>[('1a1', NULL),('1a2','1b2')] AS data UNION ALL
  SELECT 1 AS id, ARRAY<STRUCT<a STRING, b STRING>>[('2a1', '2b1'),(NULL,'2b2')] AS data 
)
+1

Source: https://habr.com/ru/post/1652412/


All Articles