Separation column in the hive

I need to split a table in hive into a column, which is also part of the table.

For instance,

Table: Employee

Columns: employeeId, employeeName, employeeSalary

I need to split a table using employeeSalary. Therefore, I am writing the following query:

  CREATE TABLE employee (employeeId INT, employeeName STRING, employeeSalary INT) PARTITIONED BY (ds INT); 

I just used the name "ds" here, since it did not allow me to specify the same name employeeSalary .

Is that right, what am I doing? Also, when inserting values โ€‹โ€‹into the table, I have to use a comma separated file. Now the file consists of a line like: 2019, John, 2000

like one line. If I have to break up using salary, then my first section will be all for salary 2000. Thus, the request will be

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=2000); 

Again, after 100 records with a salary in 2000, I have the following 500 records with a salary of 4000. So I ran the query again:

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=4000); 

PLEASE CAN KNOW IF I AM THE RIGHT ...

+3
source share
3 answers

Here's how to create a hive table with a section in the specified column.

 CREATE TABLE employee (employeeId INT, employeeName STRING) PARTITIONED BY (employeeSalary INT); 

The section column is listed in the PARTITIONED BY section.
In the Hive shell, you can run describe employee; and display all the columns in the table. With CREATE TABLE you should see 4 columns, not the 3 columns you are trying to get.

For your boot command, you will need to specify all sections for writing. (I am not very familiar with them, mainly based on http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Syntax

So something like

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (employeeSalary=2000, employeeSalary=4000); 
+5
source

Here's how the partition table works in hive: 1) the data / value of the partition column is not stored in the actual data file in the warehouse, but stored in the hive meta-storage.

2), so you should not have section column data in data files in the repository storage directory.

these should be steps for your problem.

1)

CREATE TABLE employee (employeeId INT, employeeName STRING ) PARTITIONED BY (employeeSalary INT) stored as <your choice of format>;

This will create an entry in the hive metastar in which you created a table with two columns employeeId INT, employeeName STRING and has one column in the Employal INT column.

2) emp_temp allows creating a temporary table.

 CREATE TABLE emp_temp (employeeId INT, employeeName STRING,employeeSalary INT ) stored as text; 

I assume that your input files are in text format.

3) copy all the files to the warehouse folder of the emp_temp table or run the following query (I assume that you have all the data files in the folder. / Example / files.)

LOAD DATA LOCAL INPATH './examples/files/*.txt' OVERWRITE INTO TABLE emp_temp .

4) now do the following hql (this will dynamically create partitions for you)

  INSERT OVERWRITE TABLE employee partition(employeeSalary) SELECT employeeId , employeeName , employeeSalary from emp_temp 

Thanks Aditya

+1
source

Perhaps I think you should first load all the data into one table, and then use the Hive extension (multiple attachments):

 FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; 

Then, if you want, you can

 from big_data_table insert overwrite table table1 partiton (ds=2000) select * where employeeId>0 && employeeId<101> insert overwrite table table2 partition (ds=4000) select * where employeeId>=101&&employeeId<=600 
0
source

Source: https://habr.com/ru/post/910301/


All Articles