I have the following two sets. The idea is to be able to categorize news articles based on the several meta tags that are provided to me. For example, when I receive an article with “Judge” “5 years”, it should be classified as a crime history
train = [
('Honda', 'cars'),
('Ford', 'cars'),
('Volkswagen', 'cars'),
('Courthouse', 'crime'),
('Police', 'crime'),
('Taurus', 'cars'),
('Chevrolet', 'cars'),
('Sonic', 'cars'),
('Judge', 'crime'),
('Jail', 'crime')
]
test = [
('Porsche', 'cars'),
('Toyota', 'cars'),
('Arrest', 'crime'),
('Prison', 'crime')
]
cl = NaiveBayesClassifier(train)
The problem is that when I run this:
for a, b in test:
print a, cl.classify(a)
He classifies everything as "cars"
I am sure that I lack a comparison of semantic similarities. I tried using WordNet through a text block.
I ran
word = Word("Volkswagen")
for each in word.definitions:
print each
but it does not give me any results.
Now the question is:
How can I get WordNet to say that Volkswagen is a car, integrate it into the classifier so that it understands that Hyndai is also a car and classifies it correctly?