Using k-NN in R with categorical values

I am looking to perform classification according to categorical features. For this purpose, the Euclidean distance (or any other numerical estimated distance) is not suitable.

I am looking for an implementation of kNN for [R], where you can select various remote methods, such as Hamming distance. Is there a way to use common kNN implementations, such as in {class} with different metric distance functions?

I am using R 2.15

Thanks! Omri

+4
source share
1 answer

While you can calculate the distance / difference matrix (no matter how you like it), you can easily classify kNN without the need for any special packaging.

# Generate dummy data y <- rep(1:2, each=50) # True class memberships x <- y %*% t(rep(1, 20)) + rnorm(100*20) < 1.5 # Dataset with 20 variables design.set <- sample(length(y), 50) test.set <- setdiff(1:100, design.set) # Calculate distance and nearest neighbors library(e1071) d <- hamming.distance(x) NN <- apply(d[test.set, design.set], 1, order) # Predict class membership of the test set k <- 5 pred <- apply(NN[, 1:k, drop=FALSE], 1, function(nn){ tab <- table(y[design.set][nn]) as.integer(names(tab)[which.max(tab)]) # This is a pretty dirty line } # Inspect the results table(pred, y[test.set]) 

If anyone knows a better way to find the most common value in a vector than the dirty line above, I would be happy to know.

The drop=FALSE argument is needed to save the subset NN as a matrix in the case k=1 . If not, it will be converted to a vector and apply will apply error.

+7
source

Source: https://habr.com/ru/post/1433555/


All Articles