I am new to BigQuery (and sql) and am trying to set up a table in which each row contains a message column (row) and metadata (repeated). I want to use a metadata column so that I can query for messages matching certain metadata parameters.
For example, my message line:
Hi honey, I'm home!
and the metadata column is arbitrary key / value pairs that describe the message:
{"category": "personal", "message_type": "salutation"}
This will allow me to ask for a large request:
return me all messages that have category of personal (or message_type of personal , etc.)
To repeat, the metadata keys will potentially differ for each inserted line of the message (for example, a new line may have a favorite_color metadata key instead of category ).
Here is the diagram I came up with to support the metadata flexibility I desire:
[ {"name": "message", "type": "string", "mode": "required"} {"name": "metadata", "type": "record", "mode": "repeated", "fields": [ {"name": "key", "type": "string", "mode": "required"}, {"name": "value", "type": "string", "mode": "required"}]} ]
The schema assumes that each record / metadata row contains a column named key and a column named value . It seems to be working fine.
Interestingly, Bigquery presents my message line as two lines (which, I think, is called flattening?):
message | metadata_key | metadata_value
I figured out how to request a single metadata value, for example category = personal :
SELECT * FROM [table.test] WHERE metadata.key="category" AND metadata.value="personal"
I get one line back:
message | metadata_key | metadata_value
It's great! However, I do not know how to build a more complex query that matches several metadata parameters, for example:
Get all posts that have a category of personal and message_type salutation
This following query returns nothing:
SELECT * FROM [table.test] WHERE metadata.key="category" AND metadata.value="personal" AND metadata.key="message_type" AND metadata.value="salutation"
I assume this is because bigquery splits my single line of messages into two lines (one for each line of nested metadata). This obviously means that my query will never be able to match one row, because I'm trying to match two different rows of metadata (i.e. I ask that metadata.key be equal to two different values ββat the same time, and the metadata value must be equal to different values ββat the same time). Therefore, Iβm somewhat deadlocked on how to build your query correctly.
Ideally, I would like BigQuery to return a single string (the source line of the example I started with), since it satisfies the two metadata requests of my query.
I assume I need a combination of GROUP BY, FLATTEN, WITHIN, unflatten, etc. The combination of features / structure is somewhat complicated for this beginner sql / bigquery. Any help appreciated :)