Is there a general way to refer to the last column of a data frame R in a formula object?

I want to write a general script to find the information gain of a set of functions relative to the last column. For example, in a data frame built from a matrix with 26 columns, I would write:

information.gain(V26~.,table) 

The problem is that the formula is V26 ~. does not have an obvious general form. My first thought was to try the following:

 > nms <- colnames(table) > nms[length(nms)] [1] "V26" > information.gain(nms[length(nms)]~., table) Error in model.frame.default(formula, data, na.action = NULL) : variable lengths differ (found for 'V1') 

which seemed wrong due to the fact that nms is a row vector. Is there a way to force the name to something that can be part of the formula?

+4
source share
2 answers

Here is a simple solution using dummy data

 DF <- data.frame(matrix(runif(260), ncol = 26)) names(DF) <- paste0("V", seq_len(ncol(DF))) 

Here I use tail() to select the name of the last column in DF and build the formula there.

 f <- as.formula(paste(tail(names(DF), 1), "~ .")) > f V26 ~ . 
+6
source

Modified to fit the issue. You can put the last column of the data frame in a separate vector, and then bind it in your function. For example, here is a solution using the number of columns:

last_col <- df [, ncol (df)]

function (last_col ~., blah, blah, etc.)

Hope this helps!

0
source

Source: https://habr.com/ru/post/1492813/


All Articles