How to apply the same transformation to groups of variables in a data frame?

I have a data frame with a lot of variables whose names include tags.

mydf <- data.frame( var_x = 1:5, var_y = runif(5), var_z = runif(5), other_x = 10:14, other_p = runif(5), other_r = runif(5) ) mydf var_x var_y var_z other_x other_p other_r 1 1 0.2700212 0.05893272 10 0.6212327 0.6177092 2 2 0.1284033 0.27333098 11 0.6933060 0.7520978 3 3 0.7313771 0.69352560 12 0.3154764 0.8479646 4 4 0.2400357 0.25151053 13 0.2057361 0.5138406 5 5 0.1797793 0.78550584 14 0.6671606 0.5801830 

I would like to split var_* variables var_x and other_* variables using other_x . How can i make it easy?

I tried using mutate_each for dplyr . The following works if there is only one group to scale. How can I automate this for each tag?

 library(dplyr) scale_var <- mydf$var_x mydf %>% mutate_each(funs(./scale_var), matches("^var")) 

I tried to write my own function as follows.

 mymutate <- function(data, type) { scale_var <- mydf[[paste0(type, "_x")]] data %>% mutate_each( funs(./scale_var), matches(paste0("^", type)) ) } 

But when I tried to run it only on one type mymutate(mydf, type = "var") , it threw an error that I really don't understand: Error in paste0("^", type) : object 'type' not found


UPDATE

I would like to use only the new variables, so it doesn't matter that the method also divides the x variables.

I have many tags like var and other , so I don’t want to write them in each case. This is why I tried to build my own function to use it later with lapply .

UPDATE2

These are the variables of my data frame.

  [1] "location_50_all_1" "location_50_both_sides_important_1" [3] "location_50_left_important_1" "location_50_other_important_1" [5] "location_50_right_important_1" "ownership_all_1" [7] "ownership_both_sides_important_1" "ownership_left_important_1" [9] "ownership_other_important_1" "ownership_right_important_1" [11] "person_all_1" "person_both_sides_important_1" [13] "person_left_important_1" "person_other_important_1" [15] "person_right_important_1" "union_all_1" [17] "union_both_sides_important_1" "union_left_important_1" [19] "union_other_important_1" "union_right_important_1" [21] "total_left_important" "total_right_important" [23] "total_both_sides_important" "total_other_important" [25] "total_firm_officials" "left" [27] "right" "connected" 

I would like to split location_50* variables by location_50_all_1 and the same goes for location_200* , ownership* , person* , union* .

Update3

Here is the answer to the question why 'type' not found .

+1
source share
4 answers

This modified version of mymutate will work even if the data frame is not well structured (i.e. the number of columns that need to be scaled is not the same for each case).

 # mydf # var_x var_y var_z other_x other_p other_r # 1 1 0.1913353 0.4706113 10 0.003120607 0.17808048 # 2 2 0.1620725 0.6228830 11 0.844399758 0.01361841 # 3 3 0.5148884 0.3671178 12 0.996055741 0.33513972 # 4 4 0.8086168 0.3265216 13 0.984819261 0.96802056 # 5 5 0.9902217 0.9087540 14 0.951119864 0.82479090 mymutate <- function(data, type) { scale_var <- data[[paste0(type, "_x")]] data %<>% select(matches(paste0("^", type))) %>% mutate_each(funs(./scale_var)) data[[paste0(type, "_x")]] <- scale_var data } types <- c("var", "other") lapply(types, mymutate, data=mydf) %>% bind_cols(.) # var_x var_y var_z other_x other_p other_r # 1 1 0.19133528 0.47061133 10 0.0003120607 0.017808048 # 2 2 0.08103626 0.31144148 11 0.0767636144 0.001238037 # 3 3 0.17162946 0.12237259 12 0.0830046451 0.027928310 # 4 4 0.20215421 0.08163039 13 0.0757553278 0.074463120 # 5 5 0.19804435 0.18175081 14 0.0679371332 0.058913635 
+1
source

This may be helpful. If you have three columns for each variable name (for example, three columns with "var" and three columns with "other"), I would use lapply() . Then bind the columns to return to the original data format, if necessary.

 # mydf # var_x var_y var_z other_x other_p other_r #1 1 0.8393539 0.2685360 10 0.82749405 0.77923222 #2 2 0.8966534 0.6157903 11 0.30657267 0.97301619 #3 3 0.7426782 0.6982445 12 0.75195632 0.03107233 #4 4 0.9448537 0.3711827 13 0.68455120 0.45232667 #5 5 0.4848614 0.2108115 14 0.01126723 0.91213041 library(dplyr) num <- seq(1, ncol(mydf), 3) lapply(num, function(x) mydf[, x:(x+2)]) -> foo lapply(foo, function(y) {y[,2] = y[, 2] / y[, 1] y[,3] = y[, 3] / y[, 1] y}) %>% bind_cols(.) # var_x var_y var_z other_x other_p other_r #1 1 0.83935391 0.26853595 10 0.0827494049 0.077923222 #2 2 0.44832669 0.30789516 11 0.0278702429 0.088456017 #3 3 0.24755938 0.23274817 12 0.0626630264 0.002589361 #4 4 0.23621343 0.09279569 13 0.0526577848 0.034794359 #5 5 0.09697229 0.04216230 14 0.0008048022 0.065152172 
+1
source

This is what you are after ...

 library(dplyr) mydf <- data.frame( var_x = 1:5, var_y = runif(5), var_z = runif(5), other_x = 10:14, other_p = runif(5), other_r = runif(5) ) my_df_var <- mydf %>% select(contains("var")) my_divided_var_df <- my_df_var / my_df_var[, 1] my_df_other <- mydf %>% select(contains("other")) my_divided_other_df <- my_df_other / my_df_other[, 1] my_final_df <- bind_cols(my_divided_var_df, my_divided_other_df) my_final_df var_x var_y var_z other_x other_p other_r 1 1 0.1505216 0.50006694 1 0.01507284 0.04272813 2 1 0.3694496 0.07608916 1 0.03721775 0.07758692 3 1 0.1615257 0.05903999 1 0.04790595 0.00702291 4 1 0.1867266 0.15325190 1 0.06612689 0.03709427 5 1 0.1823187 0.15005917 1 0.02325902 0.05880811 
0
source
 var_x_orig <- mydf$var_x other_x_orig <- mydf$other_x mydf %>% mutate_each(funs(./var_x_orig), matches("^var")) %>% mutate_each(funs(./other_x_orig), matches("^other")) 
0
source

Source: https://habr.com/ru/post/983655/


All Articles