How to insert data into a parquet table in Hive

I have a simple text table (with a separator ",") with the following format:

orderID INT, CustID INT, OrderTotal FLOAT, OrderNumItems INT, OrderDesc STRING

I want to insert this data into the parquet table: I created the table using:

CREATE TABLE parquet_test (orderID INT, CustID INT, OrderTotal FLOAT, 
OrderNumItems INT, OrderDesc STRING) 
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' stored as 
INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

but when Im trying to insert data using

insert overwrite table parquet_small_orders select * from small_orders;

he fails. any thoughts?

+4
source share
4 answers

What are the error messages you get on the server side of the hive?

I had a similar problem. In the hive server log I saw some problems with heap memory.

I could solve the problem with my hadoop installation using higher values ​​in mapred-site.xml

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>1536</value> 
</property>

<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1024M</value> 
</property>

<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>3072</value> 
</property>

<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx2560M</value> 
</property>
0
source

Have you completed the following steps?

  • Download parquet-hive-bundle-1.5.0.jar
  • hive-site.xml, :

    <property>
       <name>hive.jar.directory</name>
       <value>/home/hduser/hive/lib/parquet-hive-bundle-1.5.0.jar</value>
       <description>
           This is the location hive in tez mode will look for to find a site wide installed hive instance. If not set, the directory under hive.user.install.directory corresponding to current user name will be used.
       </description>
    </property>
    
0

; . csv, , ? , .

Matt

hive> create table te3 (x int, y int)                                        
    > row format delimited                                                   
    > FIELDS TERMINATED BY ','       
    > STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH '/home/cloudera/test/' OVERWRITE INTO TABLE te3;
Copying data from file:/home/cloudera/test
Copying file: file:/home/cloudera/test/testfile.csv
Loading data to table default.te3
Table default.te3 stats: [numFiles=1, numRows=0, totalSize=12, rawDataSize=0]
OK
Time taken: 1.377 seconds
hive> select * from te3;                                                     
OK
1   2
3   4
5   6
Time taken: 0.566 seconds, Fetched: 3 row(s)
hive> create table ptest (a INT, b INT)
    > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' stored as 
    > INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
    > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
OK
Time taken: 0.413 seconds
hive> insert overwrite table ptest select * from te3;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there no reduce operator
Starting Job = job_1423179894648_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1423179894648_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1423179894648_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-02-09 14:08:16,308 Stage-1 map = 0%,  reduce = 0%
2015-02-09 14:08:45,342 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.55 sec
MapReduce Total cumulative CPU time: 1 seconds 550 msec
Ended Job = job_1423179894648_0001
Stage-Stage-1: Map: 1   Cumulative CPU: 1.99 sec   HDFS Read: 234 HDFS Write: 377 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 990 msec
OK
Time taken: 68.96 seconds
hive> select * from ptest;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
1   2
3   4
5   6
Time taken: 0.06 seconds, Fetched: 3 row(s)
hive> 
0

Matthieu Lieber , .

, .

  • , . Hive 0,13, , .

  • Please add error messages / messages. "This fails" is an undefined error description, and this makes debugging difficult. The way you do the data loading looks fine and it should work. However, in the magazine it would be clear what kind of problem this is.

If this is still an open issue, you can refer to the Cloudera documentation to see some basics of how you can use beehive parquet.

Thank!

0
source

Source: https://habr.com/ru/post/1537063/


All Articles