Request for a hive stuck 99%

I am inserting entries using the left join in Hive. When I set the request to limit 1, but for all requests, the records are stuck at 99% work.

The request is executed below

   Insert overwrite table tablename select a.id , b.name from a left join b on a.id = b.id limit 1; 

But it is not

    Insert overwrite table tablename select table1.id , table2.name from table1 left join table2 on table1.id = table2.id;

I increased the number of gearboxes, but it still doesn't work.

+4
source share
4 answers

If your request is stuck 99%, check the following options -

  • Data skew, if you have skewed data, maybe 1 gear will do all the work.
  • Duplicates keys on both sides. If you have many duplicate connection keys on both sides, your output may explode and the request may become stuck.
  • , , , SMB, .
  • , .
0

Hive, , .

set hive.exec.parallel=true;
set mapred.compress.map.output=true;
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set hive.exec.parallel=true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;

, , , . . https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

, 1 2. . (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)

+3

Hive , . 99% .

, , . , .

  • hive.auto.convert.join = false
  • mapred.compress.map.output =
  • hive.exec.parallel =
+2

hive> set mapreduce.map.memory.mb=9000; hive> set mapreduce.map.java.opts=-Xmx7200m; hive> set mapreduce.reduce.memory.mb=9000; hive> set mapreduce.reduce.java.opts=-Xmx7200m

0

Source: https://habr.com/ru/post/1598797/


All Articles