Read all possible consecutive substrings in Python

Question

Read all possible consecutive substrings in Python

If I have a list of letters, for example:
word = ['W','I','N','E']
and you need to get every possible sequence of substrings of length 3 or less, for example:
WINE, WI NE, WI NE, W IN E, WIN E , etc.
What is the most effective way of doing this?

Now I have:

 word = ['W','I','N','E'] for idx,phon in enumerate(word): phon_seq = "" for p_len in range(3): if idx-p_len >= 0: phon_seq = " ".join(word[idx-(p_len):idx+1]) print(phon_seq)

It just gives me below, not a subsequence:

 W I WI N IN WIN E NE INE

I just can't figure out how to create all possible sequences.

+4

python

Adam_G Nov 06 '14 at 23:18

source share

5 answers

Since in each of the three positions (after W, after i, and after N) there may be a space or not, you can think of it as the bit is 1 or 0 in the binary representation of a number from 1 to 2 ^ 3 - 1.

 input_word = "WINE" for variation_number in xrange(1, 2 ** (len(input_word) - 1)): output = '' for position, letter in enumerate(input_word): output += letter if variation_number >> position & 1: output += ' ' print output

Edit: include only options with sequences of 3 characters or less (in the general case, when input_word can be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeros in a string. (We also start the range from a larger number to exclude cases that would have 000 in the beginning.)

 for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)): if not '000' in bin(variation_number): output = '' for position, letter in enumerate(input_word): output += letter if variation_number >> position & 1: output += ' ' print output

+2

Stuart Nov 07 '14 at 0:13

source share

My implementation for this problem.

 #!/usr/bin/env python # this is a problem of fitting partitions in the word # we'll use itertools to generate these partitions import itertools word = 'WINE' # this loop generates all possible partitions COUNTS (up to word length) for partitions_count in range(1, len(word)+1): # this loop generates all possible combinations based on count for partitions in itertools.combinations(range(1, len(word)), r=partitions_count): # because of the way python splits words, we only care about the # difference *between* partitions, and not their distance from the # word beginning diffs = list(partitions) for i in xrange(len(partitions)-1): diffs[i+1] -= partitions[i] # first, the whole word is up for taking by partitions splits = [word] # partition the word remainder (what was not already "taken") # with each partition for p in diffs: remainder = splits.pop() splits.append(remainder[:p]) splits.append(remainder[p:]) # print the result print splits

+1

Reut sharabani Nov 06 '14 at 23:33

source share

As an alternative answer, you can do this with itertools and use the groupby function to group your list, and also use combination to create a pair index list to group the key: ( i<=word.index(x)<=j ) and, finally use set to get a unique list.

Also note that you can get a unique combination of the pair index first by this method, which when you have pairs, such as (i1,j1) and (i2,j2) , if i1==0 and j2==3 and j1==i2 as (0,2) and (2,3) , this means that the results of these slices are the same, you need to delete one of them.

All in one list comprehension:

 subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]] set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])

Demonstration in detail:

 >>> cl=list(combinations(range(len(word)),2)) >>> cl [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] >>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl] >>> new_l [[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]] >>> last=[[''.join(i) for i in j] for j in new_l] >>> last [['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']] >>> set([' '.join(j) for j in last]) set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE']) >>> for i in set([' '.join(j) for j in last]): ... print i ... WIN E W IN E W INE WI NE WINE >>>

+1

Kasramvd Nov 07 '14 at 0:11

source share

I think it could be like this: word = "ABCDE" myList = []

 for i in range(1, len(word)+1,1): myList.append(word[:i]) for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1): myList.append(word[j:i]) print(myList) print(sorted(set(myList), key=myList.index)) return myList

0

Wang guanqingwang Sep 09 '17 at 17:03

source share

irrelephant · Accepted Answer · 2014-11-06T23:48:28+0000

Try this recursive algorithm:

 def segment(word): def sub(w): if len(w) == 0: yield [] for i in xrange(1, min(4, len(w) + 1)): for s in sub(w[i:]): yield [''.join(w[:i])] + s return list(sub(word)) # And if you want a list of strings: def str_segment(word): return [' '.join(w) for w in segment(word)]

Output:

 >>> segment(word) [['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']] >>> str_segment(word) ['WIN E', 'WI NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']

Read all possible consecutive substrings in Python

More articles: