Group Numbering

Suppose we have the following database:

ID Shoot hit 1 10 2 1 9 3 1 8 1 2 10 8 2 8 8 2 11 10 2 7 2 3 9 2 4 6 6 4 6 5 . . 

And I would like to have it with the numbers assigned in each group, in this case for each identifier, for example:

 ID Shoot hit number.in.group 1 10 2 1 1 9 3 2 1 8 1 3 2 10 8 1 2 8 8 2 2 11 10 3 2 7 2 4 3 9 2 1 4 6 6 1 4 6 5 2 . . 

I could do this easily using a loop. Something like this will work:

 df$number.in.group = rep(1,nrow(df)) for(i in 2:nrow(df)) if(df$ID[i]==df$ID[i-1]){ df$number.in.group[i] = df$number.in.group[i-1] + 1 } 

My question is: is there any function or more elegant way to do this other than using a loop?

+4
source share
8 answers

Using dplyr

 dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10)) library(dplyr) dat %>% group_by(ID) %>% mutate(number.in.group = 1:n()) 
+2
source

If you need one liner, something like

 df$number.in.group = unlist(lapply(table(df$ID),seq.int)) 
+8
source

You can just use rle and sequence :

 dat <- read.table(text = "ID Shoot hit + 1 10 2 + 1 9 3 + 1 8 1 + 2 10 8 + 2 8 8 + 2 11 10 + 2 7 2 + 3 9 2 + 4 6 6 + 4 6 5",sep = "",header = TRUE) > sequence(rle(dat$ID)$lengths) [1] 1 2 3 1 2 3 4 1 1 2 

In fact, I think that sequence designed for exactly that purpose.

+8
source
 > dat$number.in.group <- ave(dat$ID,dat$ID, FUN=seq_along) > dat ID Shoot hit number.in.group 1 1 10 2 1 2 1 9 3 2 3 1 8 1 3 4 2 10 8 1 5 2 8 8 2 6 2 11 10 3 7 2 7 2 4 8 3 9 2 1 9 4 6 6 1 10 4 6 5 2 
+6
source

There are probably more efficient ways, but you can use stickers on identifiers and drop them into a function that returns a sequence.

 # Example data dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10)) # Using tapply with a function that returns a sequence dat$number.in.group <- unlist(tapply(dat$ID, dat$ID, function(x){seq(length(x))})) dat 

that leads to

 > dat ID val number.in.group 1 1 -0.454652118 1 2 1 -2.391824247 2 3 2 0.530832021 1 4 2 -1.671043812 2 5 2 -0.045261549 3 6 3 2.311162484 1 7 3 -0.525635803 2 8 3 0.008588811 3 9 3 0.078942033 4 10 3 0.324156111 5 
+2
source
 df$number.in.group <- unlist(lapply(as.vector(unlist(rle(df$ID)[1])), function(x) 1:x)) 
+2
source

Here is another solution

 require(plyr) ddply(dat, .(ID), transform, num_in_grp = seq_along(hit)) 
+1
source

I compared your interlocutors, and IShouldBuyABoat is the most promising. I found that the ave function can be applied even if the dataset is not sorted according to the grouping variable.

Consider a data set:

 dane<-data.frame(g1=c(-1,-2,-2,-2,-3,-3,-3,-3,-3), g2=c('reg','pl','reg','woj','woj','reg','woj','woj','woj')) 

Joran anwser and as applied to my example:

 > sequence(rle(as.character(dane$g2))$lengths) [1] 1 1 1 1 2 1 1 2 3 

Simon Urbanek proposal and results:

 > unlist(lapply(table(dane$g2),seq.int)) pl reg1 reg2 reg3 woj1 woj2 woj3 woj4 woj5 1 1 2 3 1 2 3 4 5 

The IShouldBuyABoat code gives the correct anwser:

 > as.numeric(ave(as.character(dane$g1),as.character(dane$g1),FUN=seq_along)) [1] 1 1 2 3 1 2 3 4 5 
0
source

Source: https://habr.com/ru/post/1392835/


All Articles