Adding a prefix or suffix to most data.frame variable names in the workflow R

I want to add a suffix or prefix for most variable names in data.frame, usually after all of them have been somehow converted and before the connection is made. I have no way to do this without breaking my pipeline.

For example, with this data:

library(dplyr) set.seed(1) dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1), force = rexp(10), class = rep(c("a", "b"),5)) 

I want to get this result (variable note names):

  class speed_mean_2014 power_mean_2014 force_mean_2014 1 a 0.5572500 0.8 0.5519802 2 b 0.2850798 0.6 1.0888116 

My current approach:

 means14 <- dat14 %>% group_by(class) %>% select(-ID) %>% summarise_each(funs(mean(.))) names(means14)[2:length(names(means14))] <- paste0(names(means14)[2:length(names(means14))], "_mean_2014") 

Is there an alternative to this awkward last line that breaks my pipes? I looked at select() and rename() , but I donโ€™t want to explicitly specify the name of each variable, since I usually want to rename everything except one variable, and can have a much wider data format than in this example.

I present the last command that approximates this created function:

 appendname(cols = 2:n, str = "_mean_2014", placement = "suffix") 

What does not exist, as far as I know.

+6
source share
5 answers

After further experimentation, after posting this question, I found that the setNames function would work with the pipeline because it returns data.frame:

 dat14 %>% group_by(class) %>% select(-ID) %>% summarise_each(funs(mean(.))) %>% setNames(c(names(.)[1], paste0(names(.)[-1],"_mean_2014"))) class speed_mean_2014 power_mean_2014 force_mean_2014 1 a 0.5572500 0.8 0.5519802 2 b 0.2850798 0.6 1.0888116 
+4
source

This is a little faster, but not completely, what you want:

 dat14 %>% group_by(class) %>% select(-ID) %>% summarise_each(funs(mean(.))) -> means14 names(means14)[-1] %<>% paste0("_mean_2014") 

if you havenโ€™t used the% <>% operator before definitely checking out this link, its a super-useful tool.

you can also use it to recount or round some columns like df$meancolumn %<>% round() , etc., it just appears very often and just saves you a lot of emails

+3
source

From February 2017, you can do this with the dplyr rename_(...) command.

In this example you can do.

 dat14 %>% group_by(class) %>% select(-ID) %>% summarise_each(funs(mean(.))) %>% rename_(names(.)[-1], paste0(names(.)[-1],"_mean_2014"))) 

This is pretty similar to the answer with set_names , but also works with tibles!

+3
source

This is more of a step backward, but you might consider reformatting your data to apply the function to several years at the same time. This will keep order. If you want to compare in different years, it may be advisable that the year be a separate variable in the data frame, instead of storing the year in names. You should be able to use sumise_ to get the average. See http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

 library(dplyr) library(tidyr) set.seed(1) dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1), force = rexp(10), class = rep(c("a", "b"),5)) dat14 %>% gather(variable, value, -ID, -class) %>% mutate(year = 2014) %>% group_by(class, year, variable)%>% summarise(mean = mean(value))` 
0
source

While Sam Firks' solution using setNames() is certainly the only solution supporting an indissoluble pipe, it will not work with tbl objects from dplyr , because column names are not accessible by methods from the regular R function naming base. Here is a function that you can use inside the channel with tbl objects thanks to this hrbrmstr solution. It adds predefined prefixes and suffixes to the specified column indices. By default, all columns.

 tbl.renamer <- function(tbl,prefix="x",suffix=NULL,index=seq_along(tbl_vars(tbl))){ newnames <- tbl_vars(tbl) # Get old variable names names(newnames) <- newnames names(newnames)[index] <- paste0(prefix,".",newnames,suffix)[index] # create a named vector for .dots rename_(tbl,.dots=newnames) # rename the variables } 

Usage example (suppose auth_users beeing tbl_sql object):

 auth_user %>% tbl_vars tbl.renamer(auth_user) %>% tbl_vars auth_user %>% tbl.renamer %>% tbl_vars auth_user %>% tbl.renamer(index = c(1,5)) %>% tbl_vars 
0
source

Source: https://habr.com/ru/post/986240/


All Articles