The conversion of words to canonical forms (for example, verbs to infinitives and nouns to singular ones) is called lemmatization . One Java-lemmatizer Standford CoreNLP .
For βuseless words,β you probably want to βstop the wordsβ - there is no standard list, but there are many floating around the Internet that function more or less the same, the only difference is how many words they include (usually from 100 to 1000). I already knew that before people used this list . When deleting stop words, remember to ignore the case when searching for matches.
source share