Hive - external (dynamically) partitioned table

I have a table in MySQL, namely. nas_comps.

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code; comp_code count(leg_id) 'J' 20640 'Y' 39680 

First I imported the data to HDFSHadoop version 1.0.2) using Sqoop:

 sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \ --username hadoopranch \ --password hadoopranch \ --query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \ -m 1 \ --target-dir /pros/olap2/dataimports/nas_comps 

Then I created an external partitioned Hive table:

 /*shows the partitions on 'describe' but not 'show partitions'*/ create external table nas_comps(DS_NAME string,DEP_DATE string, CRR_CODE string,FLIGHT_NO string,ORGN string, DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int, CLOSED_CAP int) PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string) location '/pros/olap2/dataimports/nas_comps' 

Partition columns are displayed when described:

 hive> describe extended nas_comps; OK ds_name string dep_date string crr_code string flight_no string orgn string dstn string physical_cap int adjusted_cap int closed_cap int leg_id int month int comp_code string Detailed Table Information Table(tableName:nas_comps, dbName:pros_olap2_optim, owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, comment:null), FieldSchema(name:physical_cap, type:int, comment:null), FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters: {serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys: [FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, comment:null)], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 

But I'm not sure if partitions are created because:

 hive> show partitions nas_comps; OK Time taken: 0.599 seconds select count(1) from nas_comps; 

returns 0 records

How to create an external Hive table with dynamic partitions?

+6
source share
2 answers

The bush will not create partitions for you this way.
Just create a table separated by the desired partition key, then perform insert overwrite table from the external table into the new partitioned table (setting hive.exec.dynamic.partition=true and hive.exec.dynamic.partition.mode=nonstrict ).

If you need to keep a partitioned table from the outside, you need to manually create catalogs (1 folder for each section, the name should be PARTION_KEY=VALUE ) then use the MSCK REPAIR TABLE table_name;

+9
source

Dynamic separation

A section is added dynamically when a record is added to the hive table.

  • Only the insert statement is supported.
  • Not supported using load data statement.
  • You must enable dynamic partition settings before inserting data into the hive table. hive.exec.dynamic.partition.mode=nonstrict Default value strict hive.exec.dynamic.partition=true default value is false .

Dynamic Partition Request

 SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; INSERT INTO table_name PARTITION (loaded_date) select * from table_name1 where loaded_date = 20151217 

Here loaded_date = 20151217 is the section and its value.

Limitations:

  • The dynamic section will only work with the instructions above.
  • It will dynamically create a partition according to the data it selects from the loaded_date column, from table_name1 ;

If your condition does not meet the above criteria, then:

First create a partitioned table, do it like this:

 ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2'); 

or use Link to create a dynamic partition.

+5
source

Source: https://habr.com/ru/post/950338/


All Articles