How to ignore a function, including it as part of a feature set in the Weka GUI

I use the Weka GUI to run the NaiveBayes classifier in an online post. I am trying to track cases (online messages) that are incorrectly predicted so that I can learn more about how I can improve the features.

I currently have work to do this: I generate data with a unique identifier, and when I import into Weka, I delete the unique identifier. Then I attach a prediction application that saves the prediction results to a .arff file. I read the file to find instances with poor performance. For incorrectly classified instances, I use certain function values ​​that give a unique value for each instance and find an instance with the same value from my source data that contains a unique identifier. As you can see, this is a really laborious process.

I would really like to hear if there is a way to ignore the function, which in my case is a unique identifier for the instance, saving it as part of the data when the classifier starts.

Thanks.

+4
source share
3 answers

I am not sure if weka GUI has a direct option for this. However, you can achieve the same through the command line

java weka.classifiers.meta.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -W weka.classifiers.trees.RandomForest -t G:\pub-resampled-0.5.arff -TG:\test.csv.arff -p 1 -distribution > G:\out.txt 

In the above example, the first attribute is an identifier (string). The RemoveType filter will remove all string fields when building the model. However, you can still ask weka to include this identifier as part of the output (prediction) by passing -p as an argument. In my case, the first attribute (partner_id) is an identifier, so it falls into the list along with forecasts. (option -distribution - display forecast estimates for all class labels). You can get more information from http://weka.wikispaces.com/Instance+ID

 === Predictions on test data === inst# actual predicted error distribution (partner_id) 1 1:? 2:0 0,*1 (8i7t3) 2 1:? 2:0 0,*1 (8i7u1) 3 1:? 2:0 0,*1 (8i7um) 4 1:? 2:0 0.1,*0.9 (8i7ux) 5 1:? 2:0 0,*1 (8i7va) 6 1:? 2:0 0,*1 (8i7vb) 7 1:? 2:0 0,*1 (8i7vf) 

Hope you find this helpful.

+5
source

For those who come to this question late, he can do this in a graphical interface. Here is the answer I received from Mark Hall (from the Weka project):

FilteredClassifier is available only in the GUI or command line like any other classifier. Just configure it using the base classifier and the "Delete" filter (to remove the identifier, etc. before starting training / testing, the data is transferred to the base classifier).

+5
source

Developing Nicholas’s answer: if you want to do this from the graphical user interface, in addition to selecting the FilteredClassifier filter, you must open the “Additional parameters ...” field in the “Test parameters” field and enter the attribute identifier index in the “Add additional attributes” field. To enable this field, you must first mark the field "Output Predictions".

In Weka 3.7, additional attributes should be specified as a parameter of the selected method for "Outputting forecasts" by left-clicking on a field (for example, PlainText).

+3
source

Source: https://habr.com/ru/post/1435719/


All Articles