Using columns with special characters in formulas from R

I am trying to create a decision tree using rpart using a data frame that has ~ 200 columns. Some of these columns have numbers in their names, some have special characters (for example, "/"). When I try to generate a tree, I get an error, for example, the following:

R> gg.rpart <- rpart(nospecialchar ~ Special/char, data=temp, method="class") Error in eval(expr, envir, enclos) : object 'Special' not found R> gg.rpart <- rpart(nospecialchar ~ "Special/char", data=temp, method="class") Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars R> gg.rpart <- rpart(nospecialchar ~ `Special/char`, data=temp, method="class") Error in `[.data.frame`(frame, predictors) : undefined columns selected 

Do I need to change names to accommodate R, or is there a way to pass column names with special characters in R formulas?

+6
source share
3 answers

Joran comment on my question is the answer - I did not know about the existence of make.names ()

Joran, if you answer as an answer, I will mark you as correct. Hurrah!

+2
source

It works:

 dat <- data.frame(M=rnorm(10),'A/B'=1:10,check.names=F) > lm(M~`A/B`,dat) Call: lm(formula = M ~ `A/B`, data = dat) Coefficients: (Intercept) `A/B` -1.0494 0.1214 
+6
source

I just ran into the same problem and I don’t want any changes to the name when I pass it to the formulas R. R allows non-syntactic column names with backticks around them. So I'm trying to add quotes to the name, and this also works well. My code is as below:

 lapply(colnames(variable), function(gene){ formula0 <- paste0("gleason_grade", "~" "'", gene, "'") logit <- clm(as.formula(formula0), data = mydata) }) 

and now you can pass the new variable to the formula without errors.
If you do not expect any changes to a variable like me, just make a note.

0
source

Source: https://habr.com/ru/post/908402/


All Articles