Can we directly upload the Parquet file to Hive?

Question

Can we directly upload the Parquet file to Hive?

I know that we can load a parquet file using Spark SQL and use Impala, but I wonder if we can do the same with Hive. I read a lot of articles, but I'm still confused.

Simply put, I have a parquet file - say, users.parquet. Now I am amazed here how to load / paste / import data from user.parquet into a hive (obviously in a table).

Please let me know in the right direction if I miss something obvious.

Creating a hive table using parquet file metadata

https://phdata.io/examples-using-textfile-and-parquet-with-hive-and-impala/

+5

hadoop hive apache-spark-sql hiveql parquet

annunarcist Dec 16 '15 at 3:16

source share

4 answers

Getting the schema is crucial, since you will need to create a table with the corresponding schema first in Hive, and then specify it in the parquet files.

I had a similar problem when I had data in one virtual machine and I had to move it to another. Here is my walkthrough:

Find out about the original Parquet files (location and layout): describe formatted users; and show create table users; The latter will immediately get you a diagram, and also indicate the location of HDFS hdfs://hostname:port/apps/hive/warehouse/users
Learn about table show partitions users;
Copy the Parquet files table from HDFS to the local directory
```
 hdfs dfs -copyToLocal /apps/hive/warehouse/users 
```
Move them to another cluster / virtual machine or wherever you want them to go

Create a users table in your target CREATE USERS ... using the same schema

 CREATE TABLE users ( name string, ... ) PARTITIONED BY (...) STORED AS PARQUET;

Now move the Parquet files to the appropriate folder (if necessary, find out about the location of the table you just created)
```
 hdfs dfs -ls /apps/hive/warehouse/users/ hdfs dfs -copyFromLocal ../temp/* /apps/hive/warehouse/ 
```
For each partition, you need to specify Hive in the corresponding subdirectory: alter table users add partition (sign_up_date='19991231') location '/apps/hive/warehouse/users/sign_up_date=19991231'; (you can do it with a bash script)

It worked for me, hope it helps.

+4

Hendrik f Oct 16 '16 at 19:47

source share

I don’t know if this is a bit “hacked,” but I use zeppelin (comes with ambari). You can simply do the following in conjunction with spark2:

 %spark2 import org.apache.spark.sql.SaveMode var df = spark.read.parquet("hdfs:///my_parquet_files/*.parquet"); df.write.mode(SaveMode.Overwrite).saveAsTable("imported_table")

The advantage of this method is that you can also import many parquet files, even if they have a different scheme.

+1

Fabian Nov 10 '17 at 12:33

source share

You can try this ... Export / Import works for all types of files, including parquet in Hive. This is a general concept, you can configure a little based on your requirement, for example, loading from a local (or) cluster

Note. When performing separate steps, you can use hard code instead of $, and also pass the "HDFS path", "Scheme" and "scoreboard" as a parameter when running from a script. This way you can export / import unlimited tables just by passing a parameter

Step 1: hive -S -e "export table $ schema_file1. $ Tbl_file1 to '$ HDFS_DATA_PATH / $ tbl_file1';" # - Run from HDFS.
Step 2: # - It contains both data and metadata. zip and scp for the target cluster.
Step 3: hive -S -e "import table $ schema_file1. $ Tbl_file1 from '$ HDFS_DATA_PATH / $ tbl_file1';" # - The first import through an error, since the table does not exist, but automatically creates the table
Step 4: hive -S -e "import table $ schema_file1. $ Tbl_file1 from '$ HDFS_DATA_PATH / $ tbl_file1';" # - The second import imports data without any errors, since the table is currently available

thanks

Kumar

0

saranvisa Nov 18 '16 at 3:21

source share

Ram manohar · Accepted Answer · 2015-12-17T21:34:16+0000

Get the layout of the parquet file using parquet tools, for more information check the link http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools-solution.html

and build a table using the diagram at the top of the file for more information about checking Create a Hive table for reading parquet files from the parquet diagram / avro

Can we directly upload the Parquet file to Hive?

More articles: