Separation column in the hive

Question

Separation column in the hive

I need to split a table in hive into a column, which is also part of the table.

For instance,

Table: Employee

Columns: employeeId, employeeName, employeeSalary

I need to split a table using employeeSalary. Therefore, I am writing the following query:

  CREATE TABLE employee (employeeId INT, employeeName STRING, employeeSalary INT) PARTITIONED BY (ds INT);

I just used the name "ds" here, since it did not allow me to specify the same name employeeSalary .

Is that right, what am I doing? Also, when inserting values into the table, I have to use a comma separated file. Now the file consists of a line like: 2019, John, 2000

like one line. If I have to break up using salary, then my first section will be all for salary 2000. Thus, the request will be

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=2000);

Again, after 100 records with a salary in 2000, I have the following 500 records with a salary of 4000. So I ran the query again:

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=4000);

PLEASE CAN KNOW IF I AM THE RIGHT ...

+3

hive

Soumya mishra Mar 15 '11 at 19:32

source share

3 answers

Nija · Answer 1 · 2011-03-15T21:10:04+0000

Here's how to create a hive table with a section in the specified column.

 CREATE TABLE employee (employeeId INT, employeeName STRING) PARTITIONED BY (employeeSalary INT);

The section column is listed in the PARTITIONED BY section.
In the Hive shell, you can run describe employee; and display all the columns in the table. With CREATE TABLE you should see 4 columns, not the 3 columns you are trying to get.

For your boot command, you will need to specify all sections for writing. (I am not very familiar with them, mainly based on http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Syntax

So something like

 LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (employeeSalary=2000, employeeSalary=4000);

Aditya agarwal · Answer 2 · 2018-02-12T13:17:57+0000

Here's how the partition table works in hive: 1) the data / value of the partition column is not stored in the actual data file in the warehouse, but stored in the hive meta-storage.

2), so you should not have section column data in data files in the repository storage directory.

these should be steps for your problem.

1)

CREATE TABLE employee (employeeId INT, employeeName STRING ) PARTITIONED BY (employeeSalary INT) stored as <your choice of format>;

This will create an entry in the hive metastar in which you created a table with two columns employeeId INT, employeeName STRING and has one column in the Employal INT column.

2) emp_temp allows creating a temporary table.

 CREATE TABLE emp_temp (employeeId INT, employeeName STRING,employeeSalary INT ) stored as text;

I assume that your input files are in text format.

3) copy all the files to the warehouse folder of the emp_temp table or run the following query (I assume that you have all the data files in the folder. / Example / files.)

LOAD DATA LOCAL INPATH './examples/files/*.txt' OVERWRITE INTO TABLE emp_temp .

4) now do the following hql (this will dynamically create partitions for you)

  INSERT OVERWRITE TABLE employee partition(employeeSalary) SELECT employeeId , employeeName , employeeSalary from emp_temp

Thanks Aditya

user1099024 · Answer 3 · 2012-08-01T02:36:15+0000

Perhaps I think you should first load all the data into one table, and then use the Hive extension (multiple attachments):

 FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

Then, if you want, you can

 from big_data_table insert overwrite table table1 partiton (ds=2000) select * where employeeId>0 && employeeId<101> insert overwrite table table2 partition (ds=4000) select * where employeeId>=101&&employeeId<=600

Separation column in the hive

More articles: