Hiveql loop

I am trying to combine 2 datasets, say A and B. Dataset A has a flag variable that takes 2 values. Instead of combining both data together, I tried to combine 2 data sets based on the flag variable.

The merge code is as follows:

create table new_data as select a.*,by from A as a left join B as b on ax=bx 

Since I run Hive code through the CLI, I invoke it through the following command

 hive -f new_data.hql 

The looping part of the code that I call to merge the data based on the Flag variable is as follows:

 for flag in 1 2; do hive -hivevar flag=$flag -f new_data.hql done 

I put the above code in another .hql asn file, calling it:

 hive -f loop_data.hql 

But he throws a mistake.

cannot recognize input next to 'for' '' in '

Can someone tell me where I am going wrong.

Thanks!

+5
source share
1 answer
  • You must add the loop logic to the shell script.

File Name: loop_data.sh

 for flag in 1 2; do hive -hivevar flag=$flag -f new_data.hql done 

And execute the script like:

 sh loop_data.sh 
  1. In your new_data.hql script, you create a table. Because you have to separate DDL and DML in 2 separate scripts. how

DDL: create_new_data.hql

 create table new_data as select a.*, by from A as a left join B as b on ax = bx where 1 = 0; 

DML: insert_new_data.hql

 insert into new_data select a.*, by from A as a left join B as b on ax = bx where flag = ${hiveconf:flag} 

And update the shell script like:

File Name: loop_new_data.sh

 # Create table hive -f create_new_data.hql # Insert data for flag in 1 2; do hive -hiveconf flag=$flag -f insert_new_data.hql done 

And execute it like:

 sh loop_new_data.sh 

Let me know if you want more information.

+6
source

Source: https://habr.com/ru/post/1243842/


All Articles