Subquery in `where` with comparison operator

Say I have a large table, broken on the field dt. I want to query this table for data after a specific date. For instance.

select * from mytab where dt >= 20140701;

The tricky part is that date is not a constant, but comes from a subquery . So basically I want something like this:

select * from mytab where dt >= (select min(dt) from activedates);

Hive cannot do this, however, by giving me ParseExceptionin a subquery (from the documents I assume it is not yet supported).

So, how do I limit my query based on a dynamic subquery?

Please note that performance is key here. So the faster, the better, even if it looks ugly.

Also note that we have not yet switched to Hive 0.13, so solutions without request are preferred in.

+4
source share
1 answer

Hive decides to trim the section when constructing the execution plan and, therefore, should matter max(dt)before execution.

Currently, the only way to do something like this is to split the query into two parts, when the first one is select min(dt) from activedates, its results will be placed in a variable.
The second query would look like this: select * from mytab where dt >=${hiveconf:var}.

Now it is a little complicated.
You can either execute the 1st request in the OS variable like this:

a=`hive -S -e "select min(dt) from activedates"`

And then run the 2nnd request as follows:

hive -hiveconf var=$a -e "select * from mytab where dt >=${hiveconf:var}"

:

hive -e "select * from mytab where dt >=$a"

, - , .

+5

Source: https://habr.com/ru/post/1547581/


All Articles