How to repeat line suggestion in Python?

Suppose I have the line text = "A compiler translates code from a source language" . I want to do two things:

  • I need to iterate over each word and line using the NLTK library. Function to stalk PorterStemmer().stem_word(word) . We must pass the word argument. How can I drain every word and return the original sentence?

  • I need to remove certain stop words from the string text . A list containing stop words is saved in a text file (space separated)

     stopwordsfile = open('c:/stopwordlist.txt','r+') stopwordslist=stopwordsfile.read() 

    How to remove those stop words from text and get a cleared new line?

+6
source share
2 answers

I posted this as a comment, but thought that I could fully state it in the full answer with some explanation:

You want to use str.split() to split the string into words, and then pin each word:

 for word in text.split(" "): PorterStemmer().stem_word(word) 

How you want to get a string of all related words is trivial, then to combine these stems together. To do this easily and efficiently, we use str.join() and the generator expression:

 " ".join(PorterStemmer().stem_word(word) for word in text.split(" ")) 

Edit:

For your other problem:

 with open("/path/to/file.txt") as f: words = set(f) 

Here we open the file using the with statement (this is the best way to open files, since it closes them correctly, even on exceptions, and is more readable) and reads the contents into a set. We use a set because we do not care about word order or duplicates, and it will be more effective later. I assume one word per line - if it is not, and they are separated by commas or spaces, and using str.split() , as we did before (with the corresponding arguments), is probably a good plan.

 stems = (PorterStemmer().stem_word(word) for word in text.split(" ")) " ".join(stem for stem in stems if stem not in words) 

Here we use the if clause of a generator expression to ignore words that are in a set of words loaded from a file. Membership validation is O (1), so this should be relatively effective.

Edit 2:

To delete words before they finish, it's even easier:

 " ".join(PorterStemmer().stem_word(word) for word in text.split(" ") if word not in words) 

Removing the given words is simple:

 filtered_words = [word for word in unfiltered_words if not in set_of_words_to_filter] 
+9
source

Skip each word in a line:

 for word in text.split(): PorterStemmer().stem_word(word) 

Use the string concatenation method (recommended by Lattyware) to combine the fragments into one large string.

 " ".join(PorterStemmer().stem_word(word) for word in text.split(" ")) 
+4
source

Source: https://habr.com/ru/post/915228/


All Articles