Creating a data framework using the formats defined in a separate data template

I am creating several data frames and I want the columns in each of them to be of the same type as the empty data template that I created

For example, I have an empty template

template <- data.frame( char = character(), int = integer(), fac1 = factor(levels = c('level1', 'level2', 'level3')), fac2 = factor(levels = c('level4', 'level5')), stringsAsFactors = FALSE ) 

And then I want to create several data frames, but I want to save the columns in a template format (i.e. char as a character, fac2 is a factor with two levels level4 and level5)

 df1 <- data.frame( char = c('a', 'b'), int = c(1,2), fac1 = c('level2', 'level1'), fac2 = c('level4', 'level4') ) df2 <- data.frame( char = c('c', 'd'), int = c(3,4), fac1 = c('level3', 'level4'), fac2 = c('level5', 'level4') ) 

I can obviosuly specify the types of columns when I create df1 and df2 , but I want to not type the same thing repeatedly, but if, for example, the levels change in the factor, I only want to change it in one place.

If the value is created in one of the factors that is not level (for example, "level 4" in "fac1" in "df2" is higher, then it should be replaced by NA when converting to the correct format

+5
source share
3 answers

Perhaps you can just process your data frame:

 df_template <- function(...) { df <- data.frame(...) df$char <- as.character(df$char) df$int <- as.integer(df$int) df$fac1 <- factor(df$fac1, levels = c('level1', 'level2', 'level3')) df$fac2 <- factor(df$fac2, levels = c('level4', 'level5')) df } 
+6
source

We can create a function that checks the type each column in the template and uses the as.* Function to force the corresponding column of the corresponding data.frame to the corresponding type .

We make an exception for factors (because their type is integer ), and we assign the appropriate levels new modified column.

Map takes a template column and enters a pair, and the result (a list ) is then converted to data.frame .

 format_df <- function(df,template) { as.data.frame( Map(function(x,y) { if(is.factor(x)) factor(y,levels(x)) else match.fun(paste0("as.",typeof(x)))(y) # or `class<-`(y,class(x)) , same effect for given example },template,df), stringsAsFactors = FALSE) } df1b <- format_df(df1,template) # char int fac1 fac2 # 1 a 1 level2 level4 # 2 b 2 level1 level4 str(df1b) # 'data.frame': 2 obs. of 4 variables: # $ char: chr "a" "b" # $ int : int 1 2 # $ fac1: Factor w/ 3 levels "level1","level2",..: 2 1 # $ fac2: Factor w/ 2 levels "level4","level5": 1 1 

Pay attention to the output of level5 .

+3
source

Assuming you have a template, you can create a constructor for your specific data.frame.

For instance,

 .TEMPLATE <- data.frame( char = character(), int = integer(), fac1 = factor(levels = c('level1', 'level2', 'level3')), fac2 = factor(levels = c('level4', 'level5')), stringsAsFactors = FALSE ) df_constructor <- function(...) { df <- as.data.frame(...) within(df, { char <- as(char, class(.TEMPLATE$char)), int <- as(int, class(.TEMPLATE$int)), fac1 <- factor(fac1, levels(.TEMPLATE$fac1)), fac1 <- factor(fac1, levels(.TEMPLATE$fac2)), stringsAsFactors = FALSE }) } 

vapply function has an argument FUN.VALUE, where you can pass the template used to construct the return value. But I'm not sure if vapply works well with data.frames.

+1
source

Source: https://habr.com/ru/post/1276126/


All Articles