Merge rows with equal and unequal data

I am trying to combine some dirty data.

I have one data frame:

df <- data.frame(name = c("A", "A", "B", "B", "C", "C"), number = c(1, 1, 2, 2, 3, 3), product = c("fixed", "variable", "aggregate", "variable", "fixed", "fixed"), vol = c(1, 9, 2, 6, 4, 7) ) 

That's what I'm doing:

 result <- data.frame(name = c("A", "B", "C"), number = c(1, 2, 3), new_product = c("fixed variable", "aggregate variable", "fixed"), vol = c(10, 8, 11) ) 

My problem: I need to combine all equal rows in a data frame. And if they are not unique, I need to combine them into a name similar to one of the results.

I tried dplyr, but in dplyr I cannot get new_product to merge in any meaningful way, because I cannot refer to the same column again.

 df %>% group_by(name) %>% summarize (name = name, number = number, newproduct = paste(product, product) # ???? 

Any help is much appreciated!

+6
source share
4 answers

Here, how would I approach this using data.table , although I'm not sure how you defined number

 library(data.table) result <- setDT(df)[,.(new_product = toString(unique(product)), vol = sum(vol)), by = name] result[, number := .I] result # name new_product vol number # 1: A fixed, variable 10 1 # 2: B aggregate, variable 8 2 # 3: C fixed 11 3 

Note. You can use paste(unique(product), collapse = " ") instead of toString if you like the result better.

Or similarly dplyr

 df %>% group_by(name) %>% summarise(new_product = toString(unique(product)), vol=sum(vol)) %>% mutate(number = row_number()) 
+7
source

Here are two more basic ways:

 df <- data.frame(name = c("A", "A", "B", "B", "C", "C"), number = rep(1:3, times = 2, each = 1), product = c("fixed", "variable", "aggregate", "variable", "fixed", "fixed"), vol = c(1, 9, 2, 6, 4, 7) ) 
  • this one only uses ave to work with the original data frame and then removes duplicates

 within(df, { new_product <- ave(seq_along(name), name, FUN = function(x) toString(unique(df[x, 'product']))) vol <- ave(vol, name, FUN = sum) product <- NULL })[!duplicated(df$name), ] # name number vol new_product # 1 A 1 10 fixed, variable # 3 B 3 8 aggregate, variable # 5 C 2 11 fixed 
  1. this is a new_product way by creating new_product using aggregate and then matching it with the original and finally using the aggregate again to get the amount over the group
 (tmp <- aggregate(product ~ name, df, function(x) paste0(unique(x), collapse = ' '))) # name product # 1 A fixed variable # 2 B aggregate variable # 3 C fixed df$new_product <- tmp[match(df$name, tmp$name), 'product'] res <- aggregate(vol ~ name + new_product, df, sum) within(res[order(res$name), ], { number <- 1:nrow(res) }) # name new_product vol number # 3 A fixed variable 10 1 # 1 B aggregate variable 8 2 # 2 C fixed 11 3 
+3
source

Other people have already answered, but here is my solution:

 df %>% group_by (name) %>% summarise( new_product = paste (unique(product), collapse=" "), vol = sum(vol)) %>% mutate(number = row_number()) %>% select(name, number, new_product, vol) 
+2
source

Base R with some curry:

 library(functional) aggregStrFunc = Compose(unique, Curry(paste, collapse=',')) setNames(cbind( aggregate(df$vol, by=list(name=df$name), sum), aggregate(df$product, by=list(df$name), aggregStrFunc)[-1] ), c('Name', 'Vol', 'New_Product')) # Name Vol New_Product #1 A 10 fixed,variable #2 B 8 aggregate,variable #3 C 11 fixed 
+1
source

Source: https://habr.com/ru/post/985604/


All Articles