Random number generation along the length of data blocks in the R-data frame

Question

Random number generation along the length of data blocks in the R-data frame

I try to simulate n times the measurement order and see how the measurement order affects my research topic. For this, I am trying to generate integer random numbers in a new column in a data frame. I have a large data frame, and I would like to add a column to the data framework, which consists of a random number according to the number of observations in the block.

Sample data (each row is an observation):

df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), B=c("x","b","c","g","h","g","g","u","l"), C=c(1,2,4,1,5,7,1,2,5)) ABC 1 1 x 1 2 1 b 2 3 1 c 4 4 2 g 1 5 2 h 5 6 3 g 7 7 3 g 1 8 3 u 2 9 3 l 5

What I would like to do is add column D and generate random integers according to the length of each block. Blocks are defined in column A.

The result should look something like this:

 df <- data.frame(A=c(1,1,1,2,2,3,3,3,3), B=c("x","b","c","g","h","g","g","u","l"), C=c(1,2,4,1,5,7,1,2,5), D=c(2,1,3,2,1,4,3,1,2)) > df ABCD 1 1 x 1 2 2 1 b 2 1 3 1 c 4 3 4 2 g 1 2 5 2 h 5 1 6 3 g 7 4 7 3 g 1 3 8 3 u 2 1 9 3 l 5 2

I tried using the R: s sample() function to generate random numbers, but my problem is to split the data by the length of the block and add a new column. Any help is appreciated.

+4

split random r simulation

Markus korhonen Jan 6 '12 at 12:38

source share

3 answers

This is easy to do with ave

 df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) )

(you can replace length () with max () or whatever, but the length will work even if A is not a number corresponding to the length of their blocks)

+4

John Jan 6 '12 at 13:23

source share

One easy way:

 df$D = 0 counts = table(df$A) for (i in 1:length(counts)){ df$D[df$A == names(counts)[i]] = sample(counts[i]) }

+1

David robinson Jan 6 '12 at 13:24

source share

Richie cotton · Accepted Answer · 2012-01-06T12:58:27+0000

It is really easy with ddply from plyr .

 ddply(df, .(A), transform, D = sample(length(A)))

Longer manual version:

Use split to split the data frame into the first column.

 split_df <- split(df, df$A)

Then call sample for each member of the list.

 split_df <- lapply(split_df, function(df) { df$D <- sample(nrow(df)) df })

Then recombination with

 df <- do.call(rbind, split_df)

Random number generation along the length of data blocks in the R-data frame

More articles: