Increment one for each duplicate value in R

I am trying to find the right way in R to find duplicate values ​​and add a value of 1 to each subsequent duplicate value, grouped by id. For instance:

data=data.table(id=c('1','1','1','1','1','2','2','2'),value=c(95,100,101,101,101,20,35,38)) data$new_value <- ifelse(data[,data$value] == lag(data$value,1), lag(data$value,1)+1 ,data$value) data$desired_value <- c(95,100,101,102,103,20,35,38) 

Produces:

  id value new_value desired_value 1: 1 95 NA 95 2: 1 100 100 100 3: 1 101 101 101 4: 1 101 102 102 5: 1 101 102 103 6: 2 20 20 20 7: 2 35 35 35 8: 2 38 38 38 

I tried to do this with ifelse, but it doesn’t work recursively, so it only applies to the next line, not the next lines. Also, the lag function causes me to lose the first value in value .

I saw character variable examples with make.names or make.unique , but couldn't find a solution for a duplicate numeric value.

Reference Information. I do a survival analysis, and I find that the stopping time is the same with my data, so I need to make it unique by adding 1 (stopping time in seconds).

+5
source share
3 answers

Here is an attempt. You essentially group id and value and add 0:(length(value)-1) . So:

 data[, onemore := value + (0:(.N-1)), by=.(id, value)] # id value new_value desired_value onemore #1: 1 95 96 95 95 #2: 1 100 101 100 100 #3: 1 101 102 101 101 #4: 1 101 102 102 102 #5: 1 101 102 103 103 #6: 2 20 21 20 20 #7: 2 35 36 35 35 #8: 2 38 39 38 38 
+6
source

In the R database, we can use ave , where we take the first value of each group and basically add the line number of this line in this group.

 data$value1 <- ave(data$value, data$id, data$value, FUN = function(x) x[1] + seq_along(x) - 1) # id value new_value desired_value value1 #1: 1 95 96 95 95 #2: 1 100 101 100 100 #3: 1 101 102 101 101 #4: 1 101 102 102 102 #5: 1 101 102 103 103 #6: 2 20 21 20 20 #7: 2 35 36 35 35 #8: 2 38 39 38 38 
+5
source

Here is one option: tidyverse

 library(dplyr) data %>% group_by(id, value) %>% mutate(onemore = value + row_number()-1) # id value onemore # <chr> <dbl> <dbl> #1 1 95 95 #2 1 100 100 #3 1 101 101 #4 1 101 102 #5 1 101 103 #6 2 20 20 #7 2 35 35 #8 2 38 38 

Or we can use base R without an anonymous function call

 data$onemore <- with(data, value + ave(value, id, value, FUN =seq_along)-1) data$onemore #[1] 95 100 101 102 103 20 35 38 
+1
source

Source: https://habr.com/ru/post/1266282/


All Articles