I'm new to the hive, and I ran into a problem,
I have a table in the hive:
create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int, v4 int, v5 bigint, v6 int) PARTITIONED BY(dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines TERMINATED BY '\n' ;
And I run sql as:
from td INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1) INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1) INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1 INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4 INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4 INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6 INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3 INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5 INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3
it works, and the result of the result is what I want.
but there is one problem: the bush generates 9 mapreduce jobs and starts these jobs one by one.
I run an explanation on this request and I received the following message:
STAGE DEPENDENCIES: Stage-9 is a root stage Stage-0 depends on stages: Stage-9 Stage-10 depends on stages: Stage-9 Stage-1 depends on stages: Stage-10 Stage-11 depends on stages: Stage-9 Stage-2 depends on stages: Stage-11 Stage-12 depends on stages: Stage-9 Stage-3 depends on stages: Stage-12 Stage-13 depends on stages: Stage-9 Stage-4 depends on stages: Stage-13 Stage-14 depends on stages: Stage-9 Stage-5 depends on stages: Stage-14 Stage-15 depends on stages: Stage-9 Stage-6 depends on stages: Stage-15 Stage-16 depends on stages: Stage-9 Stage-7 depends on stages: Stage-16 Stage-17 depends on stages: Stage-9 Stage-8 depends on stages: Stage-17
it seems that step 9-17 corresponds to the mapreduce task 0-8
but from the explanation message above, step 10-17 depends only on step 9,
so I have a question, why work 1-8 cannot work at the same time?
Or how can I do work 1-8 at the same time?
Many thanks for your help!