Python: Is it possible to find out how many iterations in an iterator object in advance?

So far, if I wanted to know how many iterations there are in the iterator (in my case, how many protein sequences in the file) I did:

count = 0 for stuff in iterator: count += 1 print count 

However, I want to split the iterator in half, so I need to know the total number of iterations. Is there a way to find out the number of iterations that will not go through the iterator?

+4
source share
5 answers

It is impossible to know how many values ​​the iterator will produce without waiting until the end. Note that the iterator can also be infinite, so in this case the total number is not even defined.

If you can guarantee that the iterator will be finite, one way to do what you ask is to convert it to a list (using list(iterator) ) and then use the usual list functions ( len , slicing) to split it into half. Of course, in this way all the elements will be in memory at the same time, which may or may not be acceptable in your case.

Alternatively, you can try using a custom iterator class that keeps track of the total number of items to be released. No matter how possible it depends on how exactly these iterators are obtained.

+10
source

Since the iterator protocol defines only two methods:

 iterator.__iter__() iterator.next() 

the answer is no, in the general case you cannot know the number of elements in the final iterator without repeating through them.

+5
source

you can use list() to convert your iterator to a list and use len() to get the size of the list, for example:

 len(list(iterator)) 
+1
source

I think the problem raised by Nick de Klein is related to the β€œstop” problem (http://en.wikipedia.org/wiki/Halting_problem). Thus, there can be no way to determine how long the iterator is for strong theoretical reasons!

I mean, I could write a Python iterator in such a way that if such a member function exists, I solve the stop problem.

Of course, a specific container or its own private class (as Paolo suggests) may have this method. But during a finite time, it cannot be common.

+1
source

Four answers have already been provided and one accepted, but is your question correct? If you have protein sequences in a file, is the iterator the best file interface for your application? If you need only an initial approximation for the number of sequences, it would be very inexpensive to take the file length by the average length of the sequence, if it is known a priori. Or, if the iterator is supported by the database, the number of records will be requested directly.

0
source

Source: https://habr.com/ru/post/1388103/


All Articles