Add Rank Column

I have some data:

test <- data.frame(A=c("aaabbb", "aaaabb", "aaaabb", "aaaaab", "bbbaaa") ) 

etc. All items are the same length and are already sorted before I get them.

I need to create a new column of ranks: “First”, “Second”, “Third”, something after this can be left empty, and it needs to take into account the connections. So, in the above case, I would like to get the following output:

  AB aaabbb First aaaabb Second aaaabb Second aaaaab Third bbbaaa bbbbaa 

I looked at rank () and some other posts that used it, but I couldn’t get it to do what I was looking for.

+6
source share
2 answers

How about this:

 test$B <- match(test$A , unique(test$A)[1:3] ) test AB 1 aaabbb 1 2 aaaabb 2 3 aaaabb 2 4 aaaaab 3 5 bbbaaa NA 6 bbbbaa NA 

One of many ways to do this. This may not be the best, but it easily comes to mind and is quite intuitive. You can use unique because you get data pre-sorted.

As you sort the data, another suitable function is to consider rle , although in this example it is a little dumb:

 rnk <- rle(as.integer(df$A))$lengths rnk # [1] 1 2 1 1 1 test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) ) 

rle calculates the lengths (and values ​​that we really don’t need) of runs of equal values ​​in a vector, so this works because your data is already sorted.

And if you don't need to have spaces after the third ranked item, this is even simpler (and more readable):

 test$B <- rep(1:length(rnk),times=rnk) 
+3
source

This seems like a good application for factors:

 test$B <- as.numeric(factor(test$A, levels = unique(test$A))) 

cumsum also comes to mind when we add 1 each time the value changes:

 test$B <- cumsum(c(TRUE, tail(test$A, -1) != head(test$A, -1))) 

(As @Simon said, there are many ways to do this ...)

+3
source

Source: https://habr.com/ru/post/947255/


All Articles