I encountered this same problem earlier, and now I am dealing with it. I do the following:
- Create one String attribute that assigns a unique identifier to each instance. I assigned document names to each of my instances.
- Create the .arff file supported by WEKA.
- Whenever you need to run the classifier against .arff data, you will notice that you need to exclude the instance identifier attribute. If you do not, Weka will throw an error stating that the classifier cannot handle String attributes. Instead of throwing an exception, run the StringToNominal filter in InstanceID.
- Now, as @Rushdi said, click "Advanced Options" on the "Classification" tab.
- Check output forecasts on the “Classifier Evaluation Parameters” pop-up window.
- Enter the instance ID attribute number in the Add Additional Attributes box.
- Run the classifier for all data except the instance identifier attribute. (Most classifiers have this as the “StartSet” parameter in “Ranker,” for example, which I use with the SMO classifier.)
- If you have done everything right so far, you will see all the instances listed along with their real and predicted output values, as well as an instance identifier that can accurately indicate which documents were incorrectly classified.
Hope this helps someone. Good luck
source share