Separating column index data

This is a variant of an earlier question.

df <- data.frame(matrix(rnorm(9*9), ncol=9)) names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1") 

I want to break the dataframe into the index that is specified in the column.names name after the underscore "_". (Indexes can be any character / number of different lengths, these are just random examples).

 indx <- gsub(".*_", "", names(df)) 

and name the resulting data frames accordingly. In the end, I would like to get three data frames:

  • df_1
  • df_p
  • df_o1

Thanks!

+5
source share
2 answers

Here you can split the column names into indx , get a subset of the data in the list using lapply and [ , set the names of the list items using setNames and use list2env if you need them as separate data sets (not recommended, since most operations can be performed in list, and later, if you want, it can be saved using write.table using lapply .

  list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep="_")), envir=.GlobalEnv) head(df_1,2) # c_1 d_1 e_1 #1 1.0085829 -0.7219199 0.3502958 #2 -0.9069805 -0.7043354 -1.1974415 head(df_o1,2) # 1_o1 2_o1 3_o1 #1 0.7924930 0.434396 1.7388130 #2 0.9202404 -2.079311 -0.6567794 head(df_p,2) # a_p b_p c_p #1 -0.12392272 -1.183582 0.8176486 #2 0.06330595 -0.659597 -0.6350215 

Or using Map . This is similar to the above approach, i.e. separate the column names by indx and use [ to extract the columns, and the rest as above.

 list2env(setNames(Map(`[` , list(df), split(colnames(df), indx)), paste('df',unique(sort(indx)), sep="_")), envir=.GlobalEnv) 

Update

You can do:

  indx1 <- factor(indx, levels=unique(indx)) split(colnames(df), indx1) 
+4
source

you can try the following:

  invisible(sapply(unique(indx), function(x) assign(paste("df",x,sep="_"), df[,grepl(paste0("_",x,"$"),colnames(df))], envir=.GlobalEnv))) # the code applies to each unique element of indx the assignement (in the global environment) # of the columns corresponding to indx in a new data.frame, named according to the indx. # invisible function avoids that the data.frames are printed on screen. > ls() [1] "df" "df_1" "df_o1" "df_p" "indx" > df_1[1:3,] c_1 d_1 e_1 1 1.8033188 0.5578494 2.2458750 2 1.0095556 -0.4042410 -0.9274981 3 0.7122638 1.4677821 0.7770603 > df_o1[1:3,] 1_o1 2_o1 3_o1 1 -2.05854176 -0.92394923 -0.4932116 2 -0.05743123 -0.24143979 1.9060076 3 0.68055653 -0.70908036 1.4514368 > df_p[1:3,] a_p b_p c_p 1 -0.2106823 -0.1170719 2.3205184 2 -0.1826542 -0.5138504 1.9341230 3 -1.0551739 -0.2990706 0.5054421 
+3
source

Source: https://habr.com/ru/post/1209212/


All Articles