The tree sizes given by the CP table in rpart

In the rpart package, rpart, what determines the size of the trees represented in the CP table for the decision tree? In the example below, the CP table by default represents only trees with 1, 2, and 5 nodes (as nsplit = 0, 1, and 4, respectively).

library(rpart) fit <- rpart(Kyphosis ~ Age + Number + Start, method="class", data=kyphosis) > printcp(fit) Classification tree: rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis, method = "class") Variables actually used in tree construction: [1] Age Start Root node error: 17/81 = 0.20988 n= 81 CP nsplit rel error xerror xstd 1 0.176471 0 1.00000 1.00000 0.21559 2 0.019608 1 0.82353 0.94118 0.21078 3 0.010000 4 0.76471 0.94118 0.21078 

Is there a built-in rpart() rule for determining the size of trees? And is it possible to make t22 return cross-validation statistics for all possible tree sizes, that is, for the example above, also include rows for trees with 3 and 4 nodes (nsplit = 2, 3)?

+6
source share
2 answers

The rpart() function is controlled by the rpart.control() function. It has parameters like minsplit , which tells the function to only split when there are more cases, and then the specified value and cp , which tells the function to only split, if the total lack of compliance decreases in cp , If you look at summary(fit) in in the above example, it shows statistics for all nsplit values. To get these printable values ​​when using printcp(fit) , you must select the appropriate cp and minsplit when calling the original rpart function.

+3
source

The rpart documentation of cran-r mentions adding the cp = 0 option to the rpart function. http://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf It also mentions other parameters that can be set in the rpart function, for example, to control the number of sections.

  dfit <- rpart(y ~ x, method='class', control = rpart.control(xval = 10, minbucket = 2, **cp = 0**)) 
+1
source

Source: https://habr.com/ru/post/980766/


All Articles