The ordered sum of the total number of unique words seen by this position

Question

The ordered sum of the total number of unique words seen by this position

I have a list of words below (example):

['the', 'counter', 'starts', 'the', 'starts', 'for']

I want to process this list in order and generate pair (x,y) , where x increases with each word, and y increases only when it sees a unique word. So, for this example, my output should look like this:
[(1,1) (2,2), (3,3) (4,3) (5,3) (6,4)]

I am not sure how to do this in python. It would be great if I could get an idea of how to do this. Thanks.

+4

python list

gsb Feb 24 '12 at 7:10

source share

5 answers

try the following:

 >>>from collections import Counter >>>data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>>tally=Counter() >>>for elem in data: >>> tally[elem] += 1 >>>tally Counter({'starts': 2, 'the': 2, 'counter': 1, 'for': 1})

from here: http://docs.python.org/2/library/collections.html

Of course, this leads to the fact that the dictionary is not a list. I don’t know if there is a way to convert this dict to a list (like some kind of zip function)? Hope this can be any help for everyone

+8

oneindelijk Feb 25 '13 at 10:10

source share

Use `collections.Counter` to count occurrences:

I understand that this does not directly answer your question, but is a canonical, pythonic way of counting the material as an answer to the misuse presented in this answer .

 from collections import Counter data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] counter = Counter(data)

The result is an object of type dict, which can be obtained using the keys

 counter['the'] >>> 2

you can also call Counter.items () to generate an unordered list of pairs (elements, count)

 counter.items() >>> [('starts', 2), ('the', 2), ('counter', 1), ('for', 1)]

The result you want is a bit strange, maybe you should change your mind why you need data in this format.

+6

Graeme stuart Nov 18 '14 at 17:20

source share

Like this:

 >>> seen = set() >>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>> for x, w in enumerate(words, 1): ... seen.add(w) ... print(x, len(seen)) ... (1, 1) (2, 2) (3, 3) (4, 3) (5, 3) (6, 4)

In practice, I would make a generator function to consistently output tuples, rather than print them:

 def uniq_count(lst): seen = set() for w in lst: seen.add(w) yield len(seen) counts = list(enumerate(uniq_count(words), 1))

Please note that I also separated the logic of the two counters. Since enumerate does exactly what you need for the first number in each pair, it is easiest to process the second number in the generator and enumerate process the first.

+5

Michael J. Barber Feb 24 '12 at 7:16

source share

 data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] print [(i, len(set(data[:i]))) for i, v in enumerate(data, 1)]

The dictionary specified in your comment is created as follows:

 data = ['the', 'counter', 'starts', 'the', 'starts', 'for'] print {j: data.count(j) for j in set(data)}

+2

Tsukemen Feb 24 '12 at 8:17

source share

Raymond hettinger · Accepted Answer · 2012-02-24T07:19:36+0000

 >>> words = ['the', 'counter', 'starts', 'the', 'starts', 'for'] >>> uniq = set() >>> result = [] >>> for i, word in enumerate(words, 1): uniq.add(word) result.append((i, len(uniq))) >>> result [(1, 1), (2, 2), (3, 3), (4, 3), (5, 3), (6, 4)]

The ordered sum of the total number of unique words seen by this position

Use collections.Counter to count occurrences:

More articles:

Use `collections.Counter` to count occurrences: