Does Google BigQuery support the Parquet file format?

Question

Does Google BigQuery support the Parquet file format?

I was wondering if Google BigQuery supports the parquet file format or if there are plans to support it?

I know that it currently supports CSV and JSON formats.

+5

google-bigquery parquet

YABADABADOU Oct 27 '15 at 13:45

source share

4 answers

BigQuery does not currently support the Parquet file format. However, we are interested in learning more about your use case - are you interested in importing, exporting, or both? How are you going to use it? Understanding the scripts will help the BigQuery team better plan accordingly.

+1

Mosha pasumansky Oct 27 '15 at 15:22

source share

If you want to share the file format between BigQuery and Hadoop, you can use JSON entries separated by a newline.

BigQuery supports them for import and export.

Hadoop also supports this. A search on the Internet finds many hits showing recipes for its work. Here's one: Handling JSON using java Mapreduce

+1

Michael sheldon Oct 28 '15 at 17:45

source share

When you are dealing with hundreds of millions of rows and you need to move data to a local Hadoop cluster, this, exporting from bigQuery, json is just an impossible option, avro is not much better, the only effective option today is for moving gz data that, unfortunately impossible to read initially in Hadoop, Larquet is the only effective way for this use case, we have no other effective option

0

user9363411 Feb 15 '18 at 6:36

source share

user7610 · Accepted Answer · 2017-03-25T17:47:39+0000

** As of March 1, 2018, support for downloading Parquet files is available.

With version 2.0.24 of the BigQuery CLI, there is an option --source_format PARQUET , which is described in the output of bq --help .

If I try to use it in my project, I get an error message. Based on reading the BigQuery related ticket, Parquet download support seems to be currently available as an invitation only.

% bq load --source_format PARQUET test.test3 data.avro.parquet schema.json Upload complete. Waiting on bqjob_r5b8a2b16d964eef7_0000015b0690a06a_1 ... (0s) Current status: DONE BigQuery error in load operation: Error processing job 'msgqegcp:bqjob_r5b8a2b16d964eef7_0000015b0690a06a_1': Loading of parquet file format is not enabled

My information is that the Parquet file is half the size of the Avro file. I wanted to try something new and load the data efficiently (in that order).

Does Google BigQuery support the Parquet file format?

More articles: