Change the class of variables in the data frame using a different reference data frame

I was looking for a way to change the class of variables in one data frame using a link from another data frame that has class information for each variable.

I have data containing about 150 variables. All variables are in character format. Now I want to change the class of each variable depending on its type. To do this, we created a separate data frame containing class information for each of the variables. Let me explain an example data frame.

Consider my original data frame as df with 5 variables -

df <- data.frame(A="a",B="1",C="111111",D="d",E="e")

Now we have another "variable_info" data frame, which contains only 2 variables, one "variable name" and another "variable_class".

variable_info <- data.frame(variable_name=c("A","B","C","D","E"),variable_class=c("character","integer","numeric","character","character"))

Now, using the variable_information data frame, I want to change the class for each of the variables in df so that their class is specified in "variable_info $ variable_class", associating the name of the variable with "variable_info $ variable_name"

How to do this for a data frame? Would it be useful to do this in data.table? How to do this in data.table?

Thank!

Prasadam

+4
source share
2 answers

You can try the following:

Make sure that both tables are in the same order:

variable_info <- variable_info[match(variable_info$variable_name, names(df)),]

Create a list of function calls:

funs <- sapply(paste0("as.", variable_info$variable_class), match.fun)

Then map them to each column:

df[] <- Map(function(dd, f) f(as.character(dd)), df, funs)

With help, data.tableyou can do this in much the same way, except that you replace the last line:

library(data.table)
dt <- as.data.table(df) # or use setDT(df)
dt[, names(dt) := Map(function(dd, f) f(as.character(dd)), dt, funs)]
+2

- . , .

    matchColClasses<- function(df1, df2){
    # Purpose:  protect joins from column type mismatches - a problem with multi-column empty df          
    # Input:    df1 - master for class assignments, df2 - for col reclass and return.
    # Output:   df2 with shared columns classed to match df1
    # Usage:    df2 <- matchColClasses(df1, df2)

      sharedColNames <- names(df1)[names(df1) %in% names(df2)]
      sharedColTypes <- sapply(df1[,sharedColNames], class)

      for (n in sharedColNames) {
        class(df2[, n]) <- sharedColTypes[n]
      }

      return(df2)
     }
0

Source: https://habr.com/ru/post/1659182/


All Articles