The ordered sum of the total number of unique words seen by this position

I have a list of words below (example):

['the', 'counter', 'starts', 'the', 'starts', 'for'] 

I want to process this list in order and generate pair (x,y) , where x increases with each word, and y increases only when it sees a unique word. So, for this example, my output should look like this:
[(1,1) (2,2), (3,3) (4,3) (5,3) (6,4)]

I am not sure how to do this in python. It would be great if I could get an idea of ​​how to do this. Thanks.

+4
source share
5 answers
 >>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>> uniq = set() >>> result = [] >>> for i, word in enumerate(words, 1): uniq.add(word) result.append((i, len(uniq))) >>> result [(1, 1), (2, 2), (3, 3), (4, 3), (5, 3), (6, 4)] 
+6
source

try the following:

 >>>from collections import Counter >>>data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>>tally=Counter() >>>for elem in data: >>> tally[elem] += 1 >>>tally Counter({'starts': 2, 'the': 2, 'counter': 1, 'for': 1}) 

from here: http://docs.python.org/2/library/collections.html

Of course, this leads to the fact that the dictionary is not a list. I don’t know if there is a way to convert this dict to a list (like some kind of zip function)? Hope this can be any help for everyone

+8
source

Use collections.Counter to count occurrences:

I understand that this does not directly answer your question, but is a canonical, pythonic way of counting the material as an answer to the misuse presented in this answer .

 from collections import Counter data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] counter = Counter(data) 

The result is an object of type dict, which can be obtained using the keys

 counter['the'] >>> 2 

you can also call Counter.items () to generate an unordered list of pairs (elements, count)

 counter.items() >>> [('starts', 2), ('the', 2), ('counter', 1), ('for', 1)] 

The result you want is a bit strange, maybe you should change your mind why you need data in this format.

+6
source

Like this:

 >>> seen = set() >>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>> for x, w in enumerate(words, 1): ... seen.add(w) ... print(x, len(seen)) ... (1, 1) (2, 2) (3, 3) (4, 3) (5, 3) (6, 4) 

In practice, I would make a generator function to consistently output tuples, rather than print them:

 def uniq_count(lst): seen = set() for w in lst: seen.add(w) yield len(seen) counts = list(enumerate(uniq_count(words), 1)) 

Please note that I also separated the logic of the two counters. Since enumerate does exactly what you need for the first number in each pair, it is easiest to process the second number in the generator and enumerate process the first.

+5
source
 data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] print [(i, len(set(data[:i]))) for i, v in enumerate(data, 1)] 

The dictionary specified in your comment is created as follows:

 data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] print {j: data.count(j) for j in set(data)} 
+2
source

Source: https://habr.com/ru/post/1398178/


All Articles