Hive - check whether the array in each row of the table contains any relevant data in a column in another table

I have two tables with columns as shown below.

Posts:
user STRING,
tag_list ARRAY<STRING>

Tags:
tag STRING

Here are some examples of data in these tables.

Posts:
user1   help, pig
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

Tags:
hadoop
hiveql

If I want to filter only the rows in the Posts table that contain the tags listed in the tag table, as shown in the results below, how do I do this in a hive request? I am not sure how to join these two tables using array_contains.

Expected Query Result:
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

thanks

+4
source share
1 answer

I managed to get the expected result with this query.

select distinct b.user, b.tag_list from 
(select tag from tags) a
join
(select user, tag_list, exp from posts
lateral view explode(tag_list) exploded_table as exp) b
on (a.tag = b.exp);
+5
source

Source: https://habr.com/ru/post/1523945/


All Articles