How about this:
test$B <- match(test$A , unique(test$A)[1:3] ) test AB 1 aaabbb 1 2 aaaabb 2 3 aaaabb 2 4 aaaaab 3 5 bbbaaa NA 6 bbbbaa NA
One of many ways to do this. This may not be the best, but it easily comes to mind and is quite intuitive. You can use unique because you get data pre-sorted.
As you sort the data, another suitable function is to consider rle , although in this example it is a little dumb:
rnk <- rle(as.integer(df$A))$lengths rnk
rle calculates the lengths (and values that we really don’t need) of runs of equal values in a vector, so this works because your data is already sorted.
And if you don't need to have spaces after the third ranked item, this is even simpler (and more readable):
test$B <- rep(1:length(rnk),times=rnk)
source share