How to do the job of laying simultaneous work?

I'm new to the hive, and I ran into a problem,

I have a table in the hive:

create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int, v4 int, v5 bigint, v6 int) PARTITIONED BY(dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines TERMINATED BY '\n' ; 

And I run sql as:

 from td INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1) INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1) INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1 INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4 INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4 INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6 INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3 INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5 INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3 

it works, and the result of the result is what I want.

but there is one problem: the bush generates 9 mapreduce jobs and starts these jobs one by one.

I run an explanation on this request and I received the following message:

 STAGE DEPENDENCIES: Stage-9 is a root stage Stage-0 depends on stages: Stage-9 Stage-10 depends on stages: Stage-9 Stage-1 depends on stages: Stage-10 Stage-11 depends on stages: Stage-9 Stage-2 depends on stages: Stage-11 Stage-12 depends on stages: Stage-9 Stage-3 depends on stages: Stage-12 Stage-13 depends on stages: Stage-9 Stage-4 depends on stages: Stage-13 Stage-14 depends on stages: Stage-9 Stage-5 depends on stages: Stage-14 Stage-15 depends on stages: Stage-9 Stage-6 depends on stages: Stage-15 Stage-16 depends on stages: Stage-9 Stage-7 depends on stages: Stage-16 Stage-17 depends on stages: Stage-9 Stage-8 depends on stages: Stage-17 

it seems that step 9-17 corresponds to the mapreduce task 0-8
but from the explanation message above, step 10-17 depends only on step 9,
so I have a question, why work 1-8 cannot work at the same time?

Or how can I do work 1-8 at the same time?

Many thanks for your help!

+6
source share
1 answer

In hive-default.xml, there is a property called "hive.exec.parallel", which may include running the job in parallel. The default value is incorrect. "You can change it to true to acquire this ability. You can use the other hive.exec.parallel.thread.number property to control how many jobs can be executed in most cases in parallel.

More details: https://issues.apache.org/jira/browse/HIVE-549

+5
source

Source: https://habr.com/ru/post/906013/


All Articles