R cycle of several samples from one data set

I am trying to create a simple loop in R, where I have a large dataset, and I want to create several smaller samples from this dataset and export them to excel:

I thought this would work, but it is not:

idorg <- c(1,2,3,4,5) x <- c(14,20,21,16,17) y <- c(31,21,20,50,13) dataset <- cbind (idorg,x,y) for (i in 1:4) { attempt[i] <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),] write.table(attempt[i], "C:/Users/me/Desktop/WWD/Excel/dataset[i].xls", sep='\t') } 

In Stata, you will need to save and restore data when executing the loop this way, but is it also necessary in R?

+4
source share
2 answers

You have the following issues:

  • an attempt is not declared, therefore attempt[i] cannot be assigned. Either make it a matrix to fill inside the loop (if you want to save the samples), or use it as a temporary variable attempt .
  • The file name is taken literary, you need to use paste() or sprintf() to include the value of the variable i in the file name.

Here is the working version of the code:

 idorg <- c(1,2,3,4,5) x <- c(14,20,21,16,17) y <- c(31,21,20,50,13) dataset <- cbind (idorg,x,y) for (i in 1:4) { attempt <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),] write.table(attempt, sprintf( "C:/Users/me/Desktop/WWD/Excel/dataset[%d].xls", i ), sep='\t') } 

Will Excel be able to read such a tab delimited table? I'm not sure; I would make a comma separated table and save it as .csv .

+4
source

Unlike Stata, you do not need to save and restore data for this kind of operations in R.

I think the solution in January solves your problem, but I would like to share another alternative: using lapply() to get a list of all sample datasets:

 set.seed(1) # So you can reproduce these results temp <- setNames(lapply(1:4, function(x) { x <- dataset[sample(1:nrow(dataset), 3, replace = FALSE), ]; x }), paste0("attempt.", 1:4)) 

This created a list() named temp consisting of four data.frame s.

 temp # $attempt.1 # idorg xy # [1,] 2 20 21 # [2,] 5 17 13 # [3,] 4 16 50 # # $attempt.2 # idorg xy # [1,] 5 17 13 # [2,] 1 14 31 # [3,] 3 21 20 # # $attempt.3 # idorg xy # [1,] 5 17 13 # [2,] 3 21 20 # [3,] 2 20 21 # # $attempt.4 # idorg xy # [1,] 1 14 31 # [2,] 5 17 13 # [3,] 4 16 50 

Lists are very convenient in R. Now you can use lapply() to do other funny things, for example, if you want to know the sum of the lines, you can do lapply(temp, rowSums) . Or, if you want to output individual CSV files (readable by Excel), you can do something like this:

 lapply(names(temp), function(x) write.csv(temp[[x]], file = paste0(x, ".csv"))) 
+2
source

Source: https://habr.com/ru/post/1439978/


All Articles