Sampling by coefficient in R

Question

Sampling by coefficient in R

I have a dataset of 1000 rows with the following structure:

device geslacht leeftijd type1 type2 1 mob 0 53 C 3 2 tab 1 64 G 7 3 pc 1 50 G 7 4 tab 0 75 C 3 5 mob 1 54 G 7 6 pc 1 58 H 8 7 pc 1 57 A 1 8 pc 0 68 E 5 9 pc 0 66 G 7 10 mob 0 45 C 3 11 tab 1 77 E 5 12 mob 1 16 A 1

I would like to make a sample of 80 lines, consisting of 10 lines with type 1 = A, 10 lines with type 1 = B, etc. Is there anyone who can help him?

+6

r dataframe sampling

karmabob May 07 '15 at 9:44

source share

3 answers

Here, how I would like to do this using data.table

 library(data.table) indx <- setDT(df)[, .I[sample(.N, 10, replace = TRUE)], by = type1]$V1 df[indx] # device geslacht leeftijd type1 type2 # 1: mob 0 45 C 3 # 2: mob 0 53 C 3 # 3: tab 0 75 C 3 # 4: mob 0 53 C 3 # 5: tab 0 75 C 3 # 6: mob 0 45 C 3 # 7: tab 0 75 C 3 # 8: mob 0 53 C 3 # 9: mob 0 53 C 3 # 10: mob 0 53 C 3 # 11: mob 1 54 G 7 #...

Or a simpler version would be

 setDT(df)[, .SD[sample(.N, 10, replace = TRUE)], by = type1]

Basically, we are a selection (with a replacement - as you have less than 10 rows in each group) from the row indices inside each type1 group, and then a subset of the data at that index

Similarly dplyr you can do

 library(dplyr) df %>% group_by(type1) %>% sample_n(10, replace = TRUE)

+9

David Arenburg May 07, '15 at 9:50

source share

Another option in the R database:

 df[as.vector(sapply(unique(df$type1), function(x){ sample(which(df$type1==x), 10, replace=T) })), ]

+5

Cath May 07 '15 at 11:01

source share

zx8754 · Accepted Answer · 2015-05-07T09:59:22+0000

Base R Solution:

 do.call(rbind, lapply(split(df, df$type1), function(i) i[sample(1:nrow(i), size = 10, replace = TRUE),]))

EDIT:

Sampling by coefficient in R

More articles: