How to create a stratified sample as in R

How to create a stratified sample in R using the β€œsample” package? There are 355,000 observations in my dataset. The code runs to the last line. Below is the code that I wrote, but I always get the following message: "Error in sort.list (y):" x "must be atomic for" sort.list ". Did you call" sort "on the list?"

Please do not give me old posts on Stackoverflow. I researched them, but could not use them. Thanks.

## lpdata file has 355,000 observations # Exclude Puerto Rico, Virgin Islands and Guam sub.lpdata<-subset(lpdata,"STATE" != 'PR' | "STATE" != 'VI' | "STATE" != 'GU') ## Create a 10% sample, stratified by STATE sort.lpdata<-sub.lpdata[order(sub.lpdata$STATE),] tab.state<-data.frame(table(sort.lpdata$STATE)) size.strata<-as.vector(round(ceiling(tab.state$Freq)*0.1)) s<-strata(sort.lpdata,stratanames=sort.lpdata$STATE,size=size.strata,method="srswor")} 
+4
source share
2 answers

Without knowing the strata function, a little code can do what you want:

 d <- expand.grid(id = 1:35000, stratum = letters[1:10]) p = 0.1 dsample <- data.frame() system.time( for(i in levels(d$stratum)) { dsub <- subset(d, d$stratum == i) B = ceiling(nrow(dsub) * p) dsub <- dsub[sample(1:nrow(dsub), B), ] dsample <- rbind(dsample, dsub) } ) # size per stratum in resulting df is 10 % of original size: table(dsample$stratum) 

NTN, Kay

ps: the processor time on my relict laptop is 0.09!

0
source

I should have done something similar last year. If this is what you do a lot, you can use a function like the one below. This function allows you to specify the name of the data frame you are using, which variable is the ID variable, which is a stratum, and if you want to use "set.seed". You can save the function as something like "stratified.R" and load it when you need to. See http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/

 stratified = function(df, group, size) { # USE: * Specify your data frame and grouping variable (as column # number) as the first two arguments. # * Decide on your sample size. For a sample proportional to the # population, enter "size" as a decimal. For an equal number # of samples from each group, enter "size" as a whole number. # # Example 1: Sample 10% of each group from a data frame named "z", # where the grouping variable is the fourth variable, use: # # > stratified(z, 4, .1) # # Example 2: Sample 5 observations from each group from a data frame # named "z"; grouping variable is the third variable: # # > stratified(z, 3, 5) # require(sampling) temp = df[order(df[group]),] if (size < 1) { size = ceiling(table(temp[group]) * size) } else if (size >= 1) { size = rep(size, times=length(table(temp[group]))) } strat = strata(temp, stratanames = names(temp[group]), size = size, method = "srswor") (dsample = getdata(temp, strat)) } 
+5
source

Source: https://habr.com/ru/post/1401383/


All Articles