I have a file with one JSON per line. Here is an example:
{
"product": {
"id": "abcdef",
"price": 19.99,
"specs": {
"voltage": "110v",
"color": "white"
}
},
"user": "Daniel Severo"
}
I want to create a parquet file with columns such as:
product.id, product.price, product.specs.voltage, product.specs.color, user
I know that parquet has a nested encoding using the Dremel algorithm, but I could not use it in python (not sure why).
I am a heavy user of pandas and dask, so the pipeline I'm trying to build is json data -> dask -> parquet -> pandas, although if someone has a simple example of creating and reading these nested encodings in parquet using Python I think it will be good enough: D
EDIT
So, after digging in PR, I found this: https://github.com/dask/fastparquet/pull/177
. , . dask/fastparquet, product ?