How to turn the grouper itertools object into a list

I am trying to learn how to use itertools.groupby in Python, and I wanted to find the size of each group of characters. First I tried to find out if I can find the length of one group:

from itertools import groupby len(list(list( groupby("cccccaaaaatttttsssssss") )[0][1])) 

and I get 0 every time.

I did a little research and found out that other people did it like this:

 from itertools import groupby for key,grouper in groupby("cccccaaaaatttttsssssss"): print key,len(list(grouper)) 

Which works great. I am confused why the last code is working, but the first is not working? If I wanted to get only the nth group, as I tried to do in my source code, how would I do it?

+5
source share
1 answer

The reason your first approach doesn't work is because the groups are β€œconsumed” when creating this list with

 list(groupby("cccccaaaaatttttsssssss")) 

groupby docs

The returned group itself is an iterator that shares the main iterable using groupby() . Since the source is shared when the groupby() object is expanded, the previous group is no longer visible.

Let me break it into stages.

 from itertools import groupby a = list(groupby("cccccaaaaatttttsssssss")) print(a) b = a[0][1] print(b) print('So far, so good') print(list(b)) print('What?!') 

Output

 [('c', <itertools._grouper object at 0xb715104c>), ('a', <itertools._grouper object at 0xb715108c>), ('t', <itertools._grouper object at 0xb71510cc>), ('s', <itertools._grouper object at 0xb715110c>)] <itertools._grouper object at 0xb715104c> So far, so good [] What?! 

Our itertools._grouper object at 0xb715104c empty because it shares its contents with the "parent" iterator returned by groupby , and these elements have now disappeared, because the first call to list repeated over the parent.

This really is no different from what happens if you try to iterate twice over any iterator, like a simple generator expression.

 g = (c for c in 'python') print(list(g)) print(list(g)) 

Output

 ['p', 'y', 't', 'h', 'o', 'n'] [] 

By the way, here is another way to get the length of a groupby group if you really don't need its contents; it's a little cheaper (and uses less RAM) than creating a list to find its length.

 from itertools import groupby for k, g in groupby("cccccaaaaatttttsssssss"): print(k, sum(1 for _ in g)) 

Output

 c 5 a 5 t 5 s 7 
+4
source

Source: https://habr.com/ru/post/1268756/


All Articles