Java, Weka: How to predict a numeric attribute?

I tried to use the NaiveBayesUpdateable classifier from Weka. My data contains both nominal and numerical attributes:

@relation cars @attribute country {FR, UK, ...} @attribute city {London, Paris, ...} @attribute car_make {Toyota, BMW, ...} @attribute price numeric %% car price @attribute sales numeric %% number of cars sold 

I need to predict the number of sales (numeric!) Based on other attributes.

I understand that I cannot use the numeric attribute to classify Bayes in Weka. One of the methods is to divide the value of the numeric attribute into N intervals of length k and use the nominal attribute instead, where n is the name of the class, for example: @attribute class {1,2,3, ... N}.

But the numerical attribute that I need to predict ranges from 0 to 1,000,000. Creating 1,000,000 classes does not make sense at all. How to predict a numerical attribute using Weka or what algorithms to look for if Weka does not have tools for this task?

+6
source share
3 answers

What you want to do is regression, not classification. The difference is exactly what you describe / want:

  • The classification has discrete classes / labels, any nominal attribute can be used as a class here
  • Regression has continuous labels; classes here would be wrong.

Most regression-based methods can be converted to binary classification by defining a threshold value, and the class determines whether the predicted value is above or below this threshold.

I don't know all the WEKA classifiers that offer regression, but you can start by looking at these two:

You may need to use the NominalToBinary filter to convert your nominal attributes to numeric (binary).

+10
source

you can find usage regression in weka classifiers> functions> linear regression. here is an example of creating a regression model in weka https://www.ibm.com/developerworks/opensource/library/os-weka1/

+2
source

These days, I believe that first introduced in Weka 3.7, RandomForest will work the way you want it to. Functions can be a combination of nominal and numeric, and prediction can also be numeric.

The disadvantage (I would suggest in your case) is that it is not an updatable class, since the NaiveBayesUpdateable file works well with large amounts of data that may not immediately coincide with memory.

0
source

Source: https://habr.com/ru/post/943650/


All Articles