Random number generation along the length of data blocks in the R-data frame

I try to simulate n times the measurement order and see how the measurement order affects my research topic. For this, I am trying to generate integer random numbers in a new column in a data frame. I have a large data frame, and I would like to add a column to the data framework, which consists of a random number according to the number of observations in the block.

Sample data (each row is an observation):

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), B=c("x","b","c","g","h","g","g","u","l"), C=c(1,2,4,1,5,7,1,2,5)) ABC 1 1 x 1 2 1 b 2 3 1 c 4 4 2 g 1 5 2 h 5 6 3 g 7 7 3 g 1 8 3 u 2 9 3 l 5 

What I would like to do is add column D and generate random integers according to the length of each block. Blocks are defined in column A.

The result should look something like this:

 df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), B=c("x","b","c","g","h","g","g","u","l"), C=c(1,2,4,1,5,7,1,2,5), D=c(2,1,3,2,1,4,3,1,2)) > df ABCD 1 1 x 1 2 2 1 b 2 1 3 1 c 4 3 4 2 g 1 2 5 2 h 5 1 6 3 g 7 4 7 3 g 1 3 8 3 u 2 1 9 3 l 5 2 

I tried using the R: s sample() function to generate random numbers, but my problem is to split the data by the length of the block and add a new column. Any help is appreciated.

+4
source share
3 answers

It is really easy with ddply from plyr .

 ddply(df, .(A), transform, D = sample(length(A))) 

Longer manual version:

Use split to split the data frame into the first column.

 split_df <- split(df, df$A) 

Then call sample for each member of the list.

 split_df <- lapply(split_df, function(df) { df$D <- sample(nrow(df)) df }) 

Then recombination with

 df <- do.call(rbind, split_df) 
+2
source

This is easy to do with ave

 df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) ) 

(you can replace length () with max () or whatever, but the length will work even if A is not a number corresponding to the length of their blocks)

+4
source

One easy way:

 df$D = 0 counts = table(df$A) for (i in 1:length(counts)){ df$D[df$A == names(counts)[i]] = sample(counts[i]) } 
+1
source

Source: https://habr.com/ru/post/1389645/


All Articles