The being does not support, exists. How to write the following query?

I have two tables A and B that have a column id. I want to get identifiers from A that are not in B. The obvious way is:

SELECT id FROM A WHERE id NOT IN (SELECT id FROM B) 

Unfortunately, Hive does not support exists or subqueries. Is there a way to achieve the above using compounds?

I thought about the following

 SELECT A.id FROM A,B WHERE A.id<>B.id 

But it looks like this will return an integer A, since there always exists an id in B that is not equal to any id of A.

+6
source share
3 answers

You can do the same with LEFT OUTER JOIN in Hive:

 SELECT A.id FROM A LEFT OUTER JOIN B ON (B.id = A.id) WHERE B.id IS null 
+21
source

If you ever want to do IN like this:

 SELECT id FROM A WHERE id IN (SELECT id FROM B) 

This object has a LEFT SEMI JOIN :

 SELECT a.key, a.val FROM a LEFT SEMI JOIN b on (a.key = b.key) 
+2
source

The hive seems to support IN , NOT IN , EXIST and NOT EXISTS from 0.13.

 select count(*) from flight a where not exists(select b.tailnum from plane b where b.tailnum = a.tailnum); 

Subqueries in EXIST and NOT EXISTS must have correlated predicates (for example, b.tailnum = a.tailnum in the example above) For more information, see the Hive Wiki> Subqueries in the WHERE clause.

+2
source

Source: https://habr.com/ru/post/946049/


All Articles