NLTK and WordNet classification with text blob

I have the following two sets. The idea is to be able to categorize news articles based on the several meta tags that are provided to me. For example, when I receive an article with “Judge” “5 years”, it should be classified as a crime history

train = [
             ('Honda', 'cars'),
             ('Ford', 'cars'),
             ('Volkswagen', 'cars'),
             ('Courthouse', 'crime'),
             ('Police', 'crime'),
             ('Taurus', 'cars'),
             ('Chevrolet', 'cars'),
             ('Sonic', 'cars'),
             ('Judge', 'crime'),
             ('Jail', 'crime')
             ]
    test = [
            ('Porsche', 'cars'),
            ('Toyota', 'cars'),
            ('Arrest', 'crime'),
            ('Prison', 'crime')
            ]

    cl = NaiveBayesClassifier(train)

The problem is that when I run this:

for a, b in test:
        print a, cl.classify(a)

He classifies everything as "cars"

I am sure that I lack a comparison of semantic similarities. I tried using WordNet through a text block.

I ran

word = Word("Volkswagen")
for each in word.definitions:
    print each

but it does not give me any results.

Now the question is:

How can I get WordNet to say that Volkswagen is a car, integrate it into the classifier so that it understands that Hyndai is also a car and classifies it correctly?

+4
2

WordNet® - , WordNet, , Volkswagen - WordNet. , .

+1

, , , :

" , ".

" " - , . , " " , , NaiveBayes. , , " " . WordNet -, , .

, , NaiveBayes , "" , , . , , , .

, , , ( , ), " ", , .

.

, WordNet "Volkswagen". , , "", "", " " .. , , , "" , .

WordNet , Volkswagen - , , , Hyndai .

: , "Hyunday" - "Volkswagen", - "k- " .

, , " ", "Hynday";)

+1

Source: https://habr.com/ru/post/1525679/


All Articles