I use the randomForest package in R, which allows me to calculate the proximity matrix (P). In the package description, he describes the parameter as "if proximity = TRUE when randomForest is called, a matrix of measures of approximation among the input (based on the frequency at which pairs of data points are at the same end nodes)."
I get the proximity matrix of a random forest as follows:
P <- randomForest(x, y, ntree = 1000, proximity=TRUE)$proximity
When I examine the matrix P, I see values like P (i, j) = 0.971014493, where I and j are two data instances in my training dataset (x). This value does not make sense, because when it is multiplied by 1000 (the number of trees in the forest), the resulting number is not an integer, which means "frequency". Can someone please help me understand why I get such real numbers in the proximity matrix?
source share