How to configure a bush to request metadata?

In case I run the query below on a table with a specific partitioned column, I want to make sure that the hive does not perform a full table scan and just calculates the result from the metadata itself. Is there any way to enable this?

Select max(partitioned_col) from hive_table ;

Now, when I run this query, its start map reduces tasks, and I am sure that it performs data validation, while it can very well determine the value from the metadata itself.

+4
source share
1 answer

Calculate table statistics every time you change data.

ANALYZE TABLE hive_table PARTITION(partitioned_col) COMPUTE STATISTICS FOR COLUMNS;

Enable CBO and automatic statistics collection:

set hive.cbo.enable=true;
set hive.stats.autogather=true;

, CBO, :

set hive.compute.query.using.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.stats.fetch.column.stats=true;

, : , . , , , , , , TABLE_DIR TABLE_DIR the number of partition subfolder in the path:

last_partition=$(hadoop fs -ls $TABLE_DIR/* | awk '{ print $8 }' | sort -r | head -n1 | cut -d / -f [number of partition subfolder in the path here] | cut -d = -f 2

$last_partition

  hive -hiveconf last_partition="$last_partition" -f your_script.hql
+3

Source: https://habr.com/ru/post/1668332/


All Articles