Delete stop words in java

I have a list of stop words containing about 30 words and a set of articles.

I want to analyze each article and remove those stop words from it.

I am not sure if this is the most efficient way to do this.

for example, I can iterate over a stop list and replace a word in an article if it exists with a space, but this does not seem good.

thank

+3
source share
4 answers
  • Put the words stop in java.util.Set
  • Split input into words
  • For each word in the input, see if it is contained in the set of stop words, write to the output, if not
+4
source

. , , StringBuffer; , , , . StringBuffer , String .

-, , , . , , .

+1

Sun Java Tutorials , Perl \b . , , , .

0

Read the word from the input and copy it into your StringBuilder (or wherever you put the result), if and only if it is not in the list of stop words. You can search for them faster if you put stop words in something like a HashTable.

Edit: oops, I don't know what I was thinking about, but you need a set, not a HashTable (or any other dictionary).

0
source

Source: https://habr.com/ru/post/1753202/


All Articles