R random forest: data (x) has 0 rows

I use the function randomForestfrom the randomForest package to find the most important variable: my dataframe is called urban, and my response variable is revenue, which is numeric.

urban.random.forest <- randomForest(revenue ~ .,y=urban$revenue, data = urban, ntree=500,    keep.forest=FALSE,importance=TRUE,na.action = na.omit)

I get the following error:

Error in randomForest.default(m, y, ...) : data (x) has 0 rows

in source code, it is associated with a variable x:

n <- nrow(x)
p <- ncol(x)
if (n == 0) 
stop("data (x) has 0 rows")

but I can’t understand what it is x.

+4
source share
1 answer

I solved it. I had some columns that all of their values ​​were NA or the same. I threw them and everything went well. my column classes were symbol, number and coefficient.

 candidatesnodata.index <- c()
 for (j in (1 : ncol(dataframe)))   {

   if (    is.numeric(dataframe[ ,j])  &  length(unique(as.numeric(dataframe[ ,j]))) == 1      )
     {candidatesnodata.index <- append(candidatesnodata.index,j)}
                                }

dataframe <- dataframe[ , - candidatesnodata.index]
+3
source

Source: https://habr.com/ru/post/1533152/


All Articles