The total number of values ​​in R

I hope you feel very good. I would like to know how to calculate the total amount of a data set with certain conditions. A simplified version of my dataset will look like this:

  t id  
 A 22
 A 22
 R 22
 A 41
 A 98
 A 98
 A 98
 R 98
 A 46
 A 46
 R 46
 A 46
 A 46
 A 46
 R 46
 A 46
 A 12
 R 54
 A 66
 R 13 
 A 13
 A 13
 A 13
 A 13
 R 13
 A 13

I would like to create a new dataset where for each value of "id" I would have a cumulative number of times that each id appears, but when t = R, I need to restart the count, for example.

  t id count
 A 22 1
 A 22 2
 R 22 0
 A 41 1
 A 98 1
 A 98 2
 A 98 3
 R 98 0
 A 46 1
 A 46 2
 R 46 0
 A 46 1
 A 46 2
 A 46 3
 R 46 0
 A 46 1
 A 12 1
 R 54 0
 A 66 1
 R 13 0
 A 13 1
 A 13 2
 A 13 3
 A 13 4
 R 13 0
 A 13 1

Any ideas on how to do this? Thanks in advance.

+4
source share
1 answer

Using rle :

 out <- transform(df, count = sequence(rle(do.call(paste, df))$lengths)) out$count[out$t == "R"] <- 0 

If your data.frame has more than two of these columns, and you want to check only these two columns, just replace df with df[, 1:2] (or) df[, c("t", "id")] .

If you find do.call(paste, df) dangerous (like @flodel comments), you can replace it with:

 as.character(interaction(df)) 

I personally do not find anything dangerous or clumsy in this setting (if you have the correct separator, which means that you know your data well). However, if you find this as such, a second solution may help you.


Update:

For those who don't like to use do.call(paste, df) or as.character(interaction(df)) (see comments between me, @flodel and @HongOoi), here is another basic solution:

 idx <- which(df$t == "R") ww <- NULL if (length(idx) > 0) { ww <- c(min(idx), diff(idx), nrow(df)-max(idx)) df <- transform(df, count = ave(id, rep(seq_along(ww), ww), FUN=function(y) sequence(rle(y)$lengths))) df$count[idx] <- 0 } else { df$count <- seq_len(nrow(df)) } 
+5
source

Source: https://habr.com/ru/post/1487573/


All Articles