I work with scikit-learn to create some predictive models with SVM. I have a data set with about 5,000 examples and about 700 functions. I cross-check 5 times using the 18x17 setting on my training set, and then using the optimal parameters for my test set. runs go much longer than I expected, and I noticed the following:
1) Some individual SVM training iterations seem to take only one minute, while others may take up to 15 minutes. Is this expected using different data and parameters (C and gamma, I use the rbf core)?
2) I'm trying to use 64-bit python for Windows to take advantage of extra memory, but all my python processes seem to end in 1 gig in my task manager, I don't know if this has anything with runtime.
3) I used 32bit before and worked on about one dataset, and I remember (although I didnβt save the results), but it was pretty fast. I used a third-party scikit-learn assembly for 64-bit windows, so I donβt know if I should try this on 32-bit python? (source http://www.lfd.uci.edu/~gohlke/pythonlibs/ )
Any suggestions on how I can reduce lead time are welcome. I suppose that reducing the search space of my grid search will help, but since I'm not even sure about the range of optimal parameters, I would like to keep it as large as I can. If there are faster implementations of SVM, let me know and I can try them.
Application: I came back and tried again to launch the 32-bit version. For some reason this is much faster. It took about 3 hours to get to the 64-bit version before 16 hours. Why such difference?
tomas source share