A common data processing model is grouping by a certain set of columns, applying a filter, then smoothing again. For instance:
my_data_grouped = group my_data by some_column; my_data_grouped = filter my_data_grouped by <some expression>; my_data = foreach my_data_grouped flatten(my_data);
The problem is that if my_data
starts with a scheme like (c1, c2, c3) after this operation, it will have the same scheme (mydata :: c1, mydata :: c2, mydata :: c3). Is there a way to easily remove the prefix "mydata ::" if the columns are unique?
I know I can do something like this:
my_data = foreach my_data generate c1 as c1, c2 as c2, c3 as c3;
However, this becomes inconvenient and difficult to maintain for datasets with a large number of columns and is not possible for datasets with variable columns.
source share