This is a classic problem with large-scale SVM. The SVM model will need to be retrained if new features are added, and if new data is added if you are not using online svm. Some options:
Practical parameters (from the shelf):
LIBLINEAR . If you can use Linear SVM, there are some algorithms that use a linear kernel to provide better than quadratic training time. Check out LIBLINEAR, which is in the same research group as libsvm. They simply added regression to version 1.91 released yesterday. http://www.csie.ntu.edu.tw/~cjlin/liblinear/
Oracle ODM Oracle has access to SVM in its ODM package. They take a practical approach to basically provide a βreasonably goodβ SVM without paying the computational cost of finding a truly optimal solution. They use some methods of selecting and selecting models - you can find information about this here: http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/overview/support-vector-machines-paper-1205 -129825.pdf
SHOGUN . SHOGUN Machine Learning Toolbox is designed for large-scale training, it interacts with several implementations of SVM, as well as other methods. I have never used it, but it might be worth a look: http://www.shogun-toolbox.org
Kernel-machines.org has a list of software packages: http://www.kernel-machines.org/software
Other SVM studies
If you want to collapse your own, there are many methods to try to scale SVM to large datasets that have been published in scientific documents, but the code is not necessarily available, available or supported as the above examples. They claim good results, but each of them has its own drawbacks. Many of them are related to the choice of data level. For example, several research papers use linear time clustering algorithms to cluster data and build sequential cluster-based SVM models to build a model without using all the data. Core Vector Machines claim linear training time, but there is some criticism as to how high their accuracy is. Numerous documents use various heuristic algorithms to try to select the most likely candidates for vector support. Many of them relate to classification, but can probably be adapted to regression. If you need more information about some of these studies, I can add some links.
Algorithm Learning Tools
You probably already know them, but I decided that I would drop him here just in case:
There are other algorithms that have good runtime on large data sets, but whether they will work well is difficult to say, it depends on the composition of your data. Since runtime is important, I would start with simpler models and work up to more complex ones. ANN, decision tree regression, Bayesian methods, locally weighted linear regression, or a hybrid approach, such as model trees, which is a decision tree whose leaf nodes are linear models, can be performed more quickly than SVMs on large data sets and can produce nice results.
WEKA - Weka is a good tool to explore your possibilities. I would use WEKA to try subsets of your data in different algorithms. Source code is also open in java if you choose something that you can tailor to your needs. http://www.cs.waikato.ac.nz/ml/weka/
R. The programming language R also implements many algorithms and is similar to programming in Matlab. http://www.r-project.org/
I would not recommend using WEKA or R rather than a large-scale dataset, but they are useful tools to narrow down which algorithms may work well for you.