R using rpart with 4000 entries and 13 attributes

I tried to send an email to the author of this package without success, just wondering if anyone else had this experience.

I use rpartfor 4000 rows of data with 13 attributes. I can run the same test on 300 rows of the same data without problems. When I run 4000 lines, Rgui.exe runs sequentially with 50% of the CPU and the user interface freezes; he will remain so for at least 4-5 hours if I let him start and never go out and never react.

here is the code that I use both on a subset of 300 and 4000:

train <- read.csv("input.csv", header=T)
y <- train[, 18]
x <- train[, 3:17]
library(rpart)
fit <- rpart(y ~ ., x)

Is this a known limitation rpart, am I doing something wrong? possible workarounds?

+3
source share
2 answers

The problem here was a data preparation error.

the title was rewritten far below in the middle of the data set.

+1
source

Can you reproduce the error message when you transmit random data of similar sizes and not your real data (from input.csv)? If not, maybe this is a problem with your data (maybe formatting?). After importing data using read.csv, check the data for format problems by looking at the output from ul (train).

#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))

y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))
+2
source

Source: https://habr.com/ru/post/1742169/


All Articles