How to interpret weka classification?

How can we interpret the classification result in weka using naive stories?

How is the mean, std deviation, weight sum and accuracy calculated?

How kappa statistics are calculated, average absolute error, square-square error, etc.

What is the interpretation of the confusion matrix?

+48
classification weka
May 25 '10 at 10:55
source share
3 answers

The following is an example output for a naive Bayes classifier using 10x cross validation. There is a lot of information, and what you should focus on depends on your application. I will explain some of the results below to get you started.

=== Stratified cross-validation === === Summary === Correctly Classified Instances 71 71 % Incorrectly Classified Instances 29 29 % Kappa statistic 0.3108 Mean absolute error 0.3333 Root mean squared error 0.4662 Relative absolute error 69.9453 % Root relative squared error 95.5466 % Total Number of Instances 100 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.967 0.692 0.686 0.967 0.803 0.709 0 0.308 0.033 0.857 0.308 0.453 0.708 1 Weighted Avg. 0.71 0.435 0.753 0.71 0.666 0.709 === Confusion Matrix === ab <-- classified as 59 2 | a = 0 27 12 | b = 1 

Correctly and incorrectly classified instances show the percentage of test instances that were correctly and incorrectly classified. Raw numbers are shown in the confusion matrix, with a and b representing class labels. There were 100 copies, so the percentages and raw numbers add up, aa + bb = 59 + 12 = 71, ab + ba = 27 + 2 = 29.

The percentage of correctly classified instances is often referred to as sampling accuracy or accuracy. It has some disadvantages like a performance rating (not a randomly adjusted, not sensitive to class distribution), so you probably want to consider some of the other numbers. The ROC area, or the area under the ROC curve, is my preferred measure.

Kappa is a chance-adjusted measure of agreement between classifications and true classes. It was calculated by accepting an agreement expected by chance from the observed agreement and dividing by the maximum possible agreement. A value greater than 0 means your classifier works better than probability (it really should be!).

Error rates are used for numerical prediction, not for classification. In numerical prediction, predictions are not right or wrong, the error has magnitude, and these measures reflect this.

Hope you get started.

+81
Aug 16 2018-10-10T00:
source share

To clarify michaeltwofish's answer, some notes on the rest of the values:

  • Tariff TP : rate of true positive results (instances correctly classified as a given class)

  • FP Rate : rate of false positives (cases falsely classified as a given class)

  • Accuracy : proportion of instances that truly belong to a class divided into general instances classified as that class

  • Recall : the proportion of instances classified as a given class divided by the actual total in this class (equivalent to the TP rate)

  • F-Measure : a combined measure of accuracy and memory, calculated as 2 * Precision * Recall / (Precision + Recall)

Regarding the measurement of the ROC area, I agree with michaeltwofish that this is one of the most important values ​​produced by Weka. The “optimal” classifier will have ROC area values ​​approaching 1, and 0.5 is comparable to “random guessing” (similar to Kappa statistics 0).

It should be noted that when interpreting the results, the “balance” of the data set must be taken into account. Unbalanced data sets in which a disproportionate number of instances belong to a particular class can lead to high accuracy, even if the classifier may not be particularly good.

Further reading:

+25
04 Feb '14 at 11:31
source share

It gives each value “50050000” for some algorithms, while for other classifiers these values ​​are about 49.7, 87.4, 98.2, etc.

0
Jun 15 '15 at 7:07
source share



All Articles