Add Rank Column

Question

Add Rank Column

I have some data:

test <- data.frame(A=c("aaabbb", "aaaabb", "aaaabb", "aaaaab", "bbbaaa") )

etc. All items are the same length and are already sorted before I get them.

I need to create a new column of ranks: “First”, “Second”, “Third”, something after this can be left empty, and it needs to take into account the connections. So, in the above case, I would like to get the following output:

  AB aaabbb First aaaabb Second aaaabb Second aaaaab Third bbbaaa bbbbaa

I looked at rank () and some other posts that used it, but I couldn’t get it to do what I was looking for.

+6

r ranking

pak Jun 13 '13 at 22:22

source share

2 answers

This seems like a good application for factors:

 test$B <- as.numeric(factor(test$A, levels = unique(test$A)))

cumsum also comes to mind when we add 1 each time the value changes:

 test$B <- cumsum(c(TRUE, tail(test$A, -1) != head(test$A, -1)))

(As @Simon said, there are many ways to do this ...)

+3

flodel Jun 13 '13 at 10:51

source share

Simon O'Hanlon · Accepted Answer · 2013-06-13T22:36:00+0000

How about this:

 test$B <- match(test$A , unique(test$A)[1:3] ) test AB 1 aaabbb 1 2 aaaabb 2 3 aaaabb 2 4 aaaaab 3 5 bbbaaa NA 6 bbbbaa NA

One of many ways to do this. This may not be the best, but it easily comes to mind and is quite intuitive. You can use unique because you get data pre-sorted.

As you sort the data, another suitable function is to consider rle , although in this example it is a little dumb:

 rnk <- rle(as.integer(df$A))$lengths rnk # [1] 1 2 1 1 1 test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )

rle calculates the lengths (and values that we really don’t need) of runs of equal values in a vector, so this works because your data is already sorted.

And if you don't need to have spaces after the third ranked item, this is even simpler (and more readable):

 test$B <- rep(1:length(rnk),times=rnk)

Add Rank Column

More articles: