How to remove plurals in a list of nouns?

I have a list of lines:

['bill', 'simpsons', 'cosbys', 'cosby','bills','mango', 'mangoes'] 

How to remove all plurals from this list? So, I want the output to be:

 ['bill', 'simpsons', 'cosby','mango'] 
+4
source share
4 answers

In general, the process is called `stemming ', and there is a package called' stemming 'for python.

Used like this:

 from stemming.porter2 import stem stem("simpsons") 

Stemming does more than just plural plurals, but you can modify the stem package to only perform plural reduction. Take a look at the source: http://tartarus.org/martin/PorterStemmer/python.txt

+5
source

With NodeBox Linguistics, it takes only two lines:

 import en only_singulars = [w for w in noun_list if w == en.noun.singular(w)] 

The library implements Conway pluralization rules , which deal with all kinds of exceptional cases.

+3
source

Pluralization rules have many angular cases. Perhaps you can get around the rule-based approach and use dictionary lookups to identify the plural form and single word form.

+1
source

This is not possible if additional information is not provided. For example, will all the lines in your list be in English words? Will they be nouns? If this is the case, for Python there are apparently several packages to create that seem to do a lot of work in most cases, but you will have more success, the more strictly you can define your requirements. And if the list is created from user input, the user may not agree with the results of your processing; consider octopi, indexes, etc.

0
source

Source: https://habr.com/ru/post/1380967/


All Articles