Is it possible to concatenate a row field after a group in Hive

I am evaluating Hive and have to do some concatenation of the row field after the group. I found a function called "concat_ws", but it looks like I should explicitly list all the values ​​that will be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table called "my_table" and it has two fields named country and city. I want to have only one entry for each country, and each entry will have two fields - country and cities:

select country, concat_ws(city, "|") as cities from my_table group by country 

Is this possible in the hive? I am using Hive 0.11 from CDH5 right now

+6
source share
1 answer

In database management, an aggregated function is a function in which the values ​​of several rows are grouped together as input according to certain criteria to form one value of a more significant value or dimension, such as a set, bag or list.

Source: Aggregation Function - Wikipedia

Complete the aggregated functions from the list listed on the following web page:
Built-in aggregation functions (UDAF - user-defined aggregation function)

So, the only built-in option (for Hive 0.11, for Hive 0.13 and higher you have collect_list ):
array collect_set(col)

This answer will answer your request if there are no duplicate city records for country (returns a set of objects with duplicate elements). Otherwise, create your own UDAF or aggregate outside of Hive.

Links for writing UDAF:

+4
source

Source: https://habr.com/ru/post/986435/


All Articles