Query BigQuery nested / duplicate fields

I am new to BigQuery (and sql) and am trying to set up a table in which each row contains a message column (row) and metadata (repeated). I want to use a metadata column so that I can query for messages matching certain metadata parameters.

For example, my message line:

Hi honey, I'm home!

and the metadata column is arbitrary key / value pairs that describe the message:

{"category": "personal", "message_type": "salutation"}

This will allow me to ask for a large request:

return me all messages that have category of personal (or message_type of personal , etc.)

To repeat, the metadata keys will potentially differ for each inserted line of the message (for example, a new line may have a favorite_color metadata key instead of category ).

Here is the diagram I came up with to support the metadata flexibility I desire:

 [ {"name": "message", "type": "string", "mode": "required"} {"name": "metadata", "type": "record", "mode": "repeated", "fields": [ {"name": "key", "type": "string", "mode": "required"}, {"name": "value", "type": "string", "mode": "required"}]} ] 

The schema assumes that each record / metadata row contains a column named key and a column named value . It seems to be working fine.

Interestingly, Bigquery presents my message line as two lines (which, I think, is called flattening?):

 message | metadata_key | metadata_value ------------------------------------------------------------ Hi honey, I'm home! category personal Hi honey, I'm home! message_type salutation 

I figured out how to request a single metadata value, for example category = personal :

 SELECT * FROM [table.test] WHERE metadata.key="category" AND metadata.value="personal" 

I get one line back:

 message | metadata_key | metadata_value ------------------------------------------------------------ Hi honey, I'm home! category personal 

It's great! However, I do not know how to build a more complex query that matches several metadata parameters, for example:

Get all posts that have a category of personal and message_type salutation

This following query returns nothing:

 SELECT * FROM [table.test] WHERE metadata.key="category" AND metadata.value="personal" AND metadata.key="message_type" AND metadata.value="salutation" 

I assume this is because bigquery splits my single line of messages into two lines (one for each line of nested metadata). This obviously means that my query will never be able to match one row, because I'm trying to match two different rows of metadata (i.e. I ask that metadata.key be equal to two different values ​​at the same time, and the metadata value must be equal to different values ​​at the same time). Therefore, I’m somewhat deadlocked on how to build your query correctly.

Ideally, I would like BigQuery to return a single string (the source line of the example I started with), since it satisfies the two metadata requests of my query.

I assume I need a combination of GROUP BY, FLATTEN, WITHIN, unflatten, etc. The combination of features / structure is somewhat complicated for this beginner sql / bigquery. Any help appreciated :)

+5
source share
1 answer

Try below - only one option
It sums up how many "matches you have in each line, and if everything (in this case - 2) displays it. Pretty simple

 SELECT *, SUM((metadata.key="category" AND metadata.value="personal") OR (metadata.key="message_type" AND metadata.value="salutation") ) WITHIN RECORD AS check, FROM [table.test] HAVING check = 2 

A cleaner version of this will be lower

 SELECT * FROM [table.test] OMIT RECORD IF SUM((metadata.key="category" AND metadata.value="personal") OR (metadata.key="message_type" AND metadata.value="salutation") ) != 2 
+5
source

Source: https://habr.com/ru/post/1246027/


All Articles