I think that we need to make the distinction clearer: pruned trees always work better on verification , but not necessarily on testing (in fact, this is also equal or worse performance in a training set ). I assume that pruning is done after the tree is built (i.e., after pruning).
Remember that the whole reason for using the validation set is to avoid over-training on the training data set, and the key point here is the generalization: we want the model (decision tree) to generalize outside the cases that were provided during the “training” time to new invisible examples.
source
share