Accuracy Optimization for OneClassSVM

Question

Accuracy Optimization for OneClassSVM

I have a problem that requires the use of a single class classification system. I am currently using python for development, and as a result, I am using a sci-kit to learn machine learning tasks.

From their documentation, OneClassSVM should work as expected (supplying positive examples only for training), but the resulting model gives me very inaccurate results - even on the initial training data.

X = generate_data()  # Generate matrix of tf-idf document vectors

cls = OneClassSVM(kernel='rbf', gamma=0.1, nu=0.1)
cls.fit(X)

y = cls.predict(X)
print y[y == 1].size / y.size

The above (simplified) code fragment returns an accuracy rate of 40-55% for training data. This is much worse according to new data (as expected), with almost all the results being incorrect.

An accuracy of 40-55% is essentially as good as a random classifier, so what am I doing wrong? I tried to play around with the gamma and nu options, but for me this seems to be not much.

I know that the OneClassSVM implementation uses the methods suggested by Scholkopf et. al and the alternative is a method for describing vector support data (Tax and Duin), but this is not implemented in scikitlearn and will require me to implement an interface for libsvm. On top of that, from what I understand, SVDDs are as accurate as the OneClassSVM implementation, so there is the potential to not solve my problem at all.

The generated training data is a matrix of documents represented by the standard tf-idf.

+4

python scikit-learn machine-learning

Michael aquilina Jun 25 '14 at 21:10

source share

:

13

- 100% - LibSVM?

3

scikit [0,1] [-1,1]

1

libsvm sklearn