SQL Union equivalent to Twitter Scalding

I need to join two pipes with the same set of fields, for example (id, group_name, name), in the same way that SQL UNION works. How can this be done on Twitter Scalding?

+4
source share
3 answers

Use ++ to concatenate the pipes, then use the project to get rid of the id field.

If this answer is too short, let me know and I will try to expand it.

+5
source

To join two pipes in three sets of fields, first you want to know which pipe works on a smaller data set:

largerPipe1.joinWithSmaller(('id1, 'groupName1, 'name1) -> ('id2, 'groupName2, 'name2), smallerPipe2) 

Note that field names do not have to be the same. you just have to have them in the same order. The result will contain only the symbol names in the largePipe1 file.

note the comment below: the concatenation ++ operation simply adds data from one channel to another. This is not a connection.

0
source

def ++ [U>: T] (other: TypedPipe [U]): TypedPipe [U]

Merge two TypedPipes (order is not guaranteed) This is only implemented when a group (or merge) is executed.

0
source

Source: https://habr.com/ru/post/1441278/


All Articles