I have a table in BigQuery with an ID field and a re-write field, along with some other fields, such as data collection time.
Each table has several rows, and I want to somehow select / combine one row for each identifier. Almost all solutions, such as selecting the latter, selecting the first and aggregating rows with a duplicate identifier on the same row, are acceptable in my use case, but I could not get them to work.
To be more precise, my table has an identifier field, which in BigQuery terms:
{name: ID, type: STRING, mode: NULLABLE}and a repeated field: {name: data, type: RECORD, mode: REPEATED}along with some other (equal) fields. In my table for each identifier there are several rows, each of which has a repeating field datafor itself. In my query result, I want to have a table with the same scheme in which each identifier appears only once, and the corresponding field datais a field datafrom one of the identifier entries in the original table. (or ideally combine with all his meetings)
Here is a list of solutions that do not work here:
First : use
row_number() OVER (PARTITION BY ID ORDER BY collection_time) as rn ... where rn=1
Reason: BigQuery aligns the results when used partition by, even if the Unflatten Results parameter is used.
: :
: - .
: group by ID nest/first .
: nest . SELECT ID, nest(data.a), nest(data.b) from:
ID data.a data.b
--------------------------
1 1a1 null
1a2 1b2
--------------------------
1 2a1 2b1
null 2b2
ID data.a data.b
----------------------------
1 1a1 1b2
1a2 2b1
2a1 2b2