Print all possible phrases (consecutive word combinations) in a given line

I am trying to print phrases in a specific text. I want to be able to print each phrase in the text, from two words to the maximum number of words that will allow the length of the text. I wrote a program below that prints all phrases up to 5 words long, but I cannot develop a more elegant way to print all possible phrases.

My phrase definition = consecutive words in a string, regardless of meaning.

def phrase_builder(i): phrase_length = 4 phrase_list = [] for x in range(0, len(i)-phrase_length): phrase_list.append(str(i[x]) + " " + str(i[x+1])) phrase_list.append(str(i[x]) + " " + str(i[x+1]) + " " + str(i[x+2])) phrase_list.append(str(i[x]) + " " + str(i[x+1]) + " " + str(i[x+2]) + " " + str(i[x+3])) phrase_list.append(str(i[x]) + " " + str(i[x+1]) + " " + str(i[x+2]) + " " + str(i[x+3]) + " " + str(i[x+4])) return phrase_list text = "the big fat cat sits on the mat eating a rat" print phrase_builder(text.split()) 

The output for this is:

 ['the big', 'the big fat', 'the big fat cat', 'the big fat cat sits', 'big fat', 'big fat cat', 'big fat cat sits', 'big fat cat sits on', 'fat cat', 'fat cat sits', 'fat cat sits on', 'fat cat sits on the', 'cat sits', 'cat sits on', 'cat sits on the', 'cat sits on the mat', 'sits on', 'sits on the', 'sits on the mat', 'sits on the mat eating', 'on the', 'on the mat', 'on the mat eating', 'on the mat eating a', 'the mat', 'the mat eating', 'the mat eating a', 'the mat eating a rat'] 

I want to be able to print phrases such as "the big fat cat sits on the mat eating" "fat cat sits on the mat eating a rat" , etc.

Can anyone offer some advice?

+6
source share
4 answers

Just use itertools.combinations

 from itertools import combinations text = "the big fat cat sits on the mat eating a rat" lst = text.split() for start, end in combinations(range(len(lst)), 2): print lst[start:end+1] 

output:

 ['the', 'big'] ['the', 'big', 'fat'] ['the', 'big', 'fat', 'cat'] ['the', 'big', 'fat', 'cat', 'sits'] ['the', 'big', 'fat', 'cat', 'sits', 'on'] ['the', 'big', 'fat', 'cat', 'sits', 'on', 'the'] ['the', 'big', 'fat', 'cat', 'sits', 'on', 'the', 'mat'] ['the', 'big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating'] ['the', 'big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a'] ['the', 'big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a', 'rat'] ['big', 'fat'] ['big', 'fat', 'cat'] ['big', 'fat', 'cat', 'sits'] ['big', 'fat', 'cat', 'sits', 'on'] ['big', 'fat', 'cat', 'sits', 'on', 'the'] ['big', 'fat', 'cat', 'sits', 'on', 'the', 'mat'] ['big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating'] ['big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a'] ['big', 'fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a', 'rat'] ['fat', 'cat'] ['fat', 'cat', 'sits'] ['fat', 'cat', 'sits', 'on'] ['fat', 'cat', 'sits', 'on', 'the'] ['fat', 'cat', 'sits', 'on', 'the', 'mat'] ['fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating'] ['fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a'] ['fat', 'cat', 'sits', 'on', 'the', 'mat', 'eating', 'a', 'rat'] ['cat', 'sits'] ['cat', 'sits', 'on'] ['cat', 'sits', 'on', 'the'] ['cat', 'sits', 'on', 'the', 'mat'] ['cat', 'sits', 'on', 'the', 'mat', 'eating'] ['cat', 'sits', 'on', 'the', 'mat', 'eating', 'a'] ['cat', 'sits', 'on', 'the', 'mat', 'eating', 'a', 'rat'] ['sits', 'on'] ['sits', 'on', 'the'] ['sits', 'on', 'the', 'mat'] ['sits', 'on', 'the', 'mat', 'eating'] ['sits', 'on', 'the', 'mat', 'eating', 'a'] ['sits', 'on', 'the', 'mat', 'eating', 'a', 'rat'] ['on', 'the'] ['on', 'the', 'mat'] ['on', 'the', 'mat', 'eating'] ['on', 'the', 'mat', 'eating', 'a'] ['on', 'the', 'mat', 'eating', 'a', 'rat'] ['the', 'mat'] ['the', 'mat', 'eating'] ['the', 'mat', 'eating', 'a'] ['the', 'mat', 'eating', 'a', 'rat'] ['mat', 'eating'] ['mat', 'eating', 'a'] ['mat', 'eating', 'a', 'rat'] ['eating', 'a'] ['eating', 'a', 'rat'] ['a', 'rat'] 
+11
source

First, you need to figure out how to write all four lines in the same way. Instead of concatenating words and spaces manually, use the join method:

 phrase_list.append(" ".join(str(i[x+y]) for y in range(2)) phrase_list.append(" ".join(str(i[x+y]) for y in range(3)) phrase_list.append(" ".join(str(i[x+y]) for y in range(4)) phrase_list.append(" ".join(str(i[x+y]) for y in range(5)) 

If the understanding inside the join method is unclear, here's how to write it manually:

 phrase = [] for y in range(2): phrase.append(str(i[x+y])) phrase_list.append(" ".join(phrase)) 

Once you have done this, it is trivial to replace these four lines with a loop:

 for length in range(2, phrase_length): phrase_list.append(" ".join(str(i[x+y]) for y in range(length)) 

You can simplify this in several other ways, independently.

First, i[x+y] for y in range(length) can be made much simpler by using a slice: i[x:x+length] .

And I assume i already a list of strings, so you can get rid of str calls.

In addition, range by default starts at 0 , so you can leave this out.

While we are in this, it would be much easier to think about your code if you would use meaningful variable names, like words instead of i .

So:

 def phrase_builder(words): phrase_length = 4 phrase_list = [] for i in range(len(words) - phrase_length): phrase_list.append(" ".join(words[i:i+phrase_length])) return phrase_list 

And now your loop is simple enough so you can turn it into an understanding, and all this is single-line:

 def phrase_builder(words): phrase_length = 4 return [" ".join(words[i:i+phrase_length]) for i in range(len(words) - phrase_length)] 

Last: as @SoundDefense asked, are you sure you don't want to "eat a rat"? He begins less than 5 words from the end, but in the text it is a phrase of 3 words.

If you want this, just delete the - phrase_length part.

+2
source

You must have a systematic way of listing all possible phrases.

One approach is to start with each word, and then generate all possible phrases starting with that word.

 def phrase_builder(my_words): phrases = [] for i, word in enumerate(my_words): phrases.append(word) for nextword in my_words[i+1:]: phrases.append(phrases[-1] + " " + nextword) # Remove the one-word phrase. phrases.remove(word) return phrases text = "the big fat cat sits on the mat eating a rat" print phrase_builder(text.split()) 
+1
source

I think the easiest way is to iterate over all possible start and end positions in the words list and generate phrases for the corresponding subscriptions for words:

 def phrase_builder(words): for start in range(0, len(words)-1): for end in range(start+2, len(words)+1): yield ' '.join(words[start:end]) text = "the big fat cat sits on the mat eating a rat" for phrase in phrase_builder(text.split()): print phrase 

Output:

 the big the big fat ... the big fat cat sits on the mat eating a rat ... sits on the mat eating a ... eating a rat a rat 
+1
source

Source: https://habr.com/ru/post/972866/


All Articles