How is CTE (Common Table Expression) evaluated in HIVE

My question is related to performance and how CTE is evaluated at runtime.

I plan to reuse the code by defining a base projection and then defining multiple CTEs on top of that base projection with various filters.

Does this mean any performance issues. More specifically, the base projection will be evaluated each time.

For instance:

WITH CTE_PERSON as (
   SELECT * FROM PersonTable
),


CTE_PERSON_WITH_AGE as (
   SELECT * FROM CTE_PERSON WHERE age > 24 
),

CTE_PERSON_WITH_AGE_AND_GENDER as (
  SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm'
),

CTE_PERSON_WITH_NAME as (
  SELECT * FROM CTE_PERSON WHERE name = 'abc'
)
  • Each time all records from PersonTable are loaded into memory, and then filters will be applied after (or)
  • Only the Result specified after the filters will be loaded into memory.
+4
source share
1 answer

Single scan.

:
 -
 - TableScan
 - predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)


create table t (i int,j int,k int);

explain 
with    t1 as (select i,j,k from t  where i=1)
       ,t2 as (select i,j,k from t1 where j=2)
       ,t3 as (select i,j,k from t2 where k=3) 

select * from t3
;

Explain
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Filter Operator
            predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: 1 (type: int), 2 (type: int), 3 (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              ListSink
+7

Source: https://habr.com/ru/post/1670929/


All Articles