How to get the contents of an iterator?

I am very puzzled. I have an HTML block that I popped up from a large table. It looks something like this:

<td align="left" class="page">Number:\xc2\xa0<a class="topmenu" href="http://www.example.com/whatever.asp?search=724461">724461</a> Date:\xc2\xa01/1/1999 Amount:\xc2\xa0$2.50 <br/>Person:<br/><a class="topmenu" href="http://www.example.com/whatever.asp?search=LAST&amp;searchfn=FIRST">LAST,\xc2\xa0FIRST </a> </td> 

(Actually, it looked worse, but I reworked the lines many times)

I need to print the lines and split the Date / Amount line. It seemed like a place to start was to find the children of this HTML block. A block is a string because it returned me a regular expression. So I did:

 text_soup = BeautifulSoup(text) text_children = text_soup.find('td').childGenerator() 

I realized that I can only text_children through text_children once , although I do not understand why this is so. This is the type of listiterator I'm trying to understand.

I'm used to the fact that I can assume that if I can iterate over something with a for loop, I can call any element with something like text_children [0]. This is not like an iterator. If I create a list with:

 my_array = ["one","two","three"] 

I can use my_array[1] to see the second element in the array. If I try to do text_children[1] , I get an error message:

 TypeError: 'listiterator' object is not subscriptable 

How to get the contents of an iterator?

+4
source share
3 answers

I am trying to work out a more general answer:

  • Iterable is an object that can be repeated. These include lists, tuples, etc. Upon request, they give an iterator.

  • An iterator is an object that is used to iterate. It gives a value for each request, and if it is finished, everything is finished. These are generators, a list of iterators, etc., but also e. d. file objects. Each iterator is iterable and gives itself as an iterator.

Example:

 a = [] b = iter(a) print a, b # -> [] <listiterator object at ...> 

If you do

 for i in a: ... 

a iterator is requested through its __iter__() method, and this iterator is then requested for the following elements until exhausted. This happens using the .next() method (respectively __next__() in 3.x).

Indexing is a completely different thing. Since iteration can occur by indexing, if the object does not have a .__iter__() method, each indexed object is iterable, but not vice versa.

+1
source

You can easily make a list from an iterator:

 my_list = list(your_generator) 

Now you can index the elements:

 print(my_list[1]) 

Another way to get the value is with next . This will pull the next value from the iterator, but, as you have already discovered, as soon as you pull the value from the iterator, you cannot always return it (return it or not, you can completely return it back to the object that repeats and what its next method looks like) .

The reason for this is that often you just want an object that you can iterate over. iterators are great for this, as they compute elements 1 at a time, rather than having to store all values. In other words, you only have one element from the iterator consuming your system memory at a time - compared to a list or tuple, where all the elements are usually stored in memory before iteration begins.

+8
source

the short answer, as stated above, is to simply create a list from your generator.

like this: list(generator)

long answer and explanation why:

when you create a generator, or in your case a "listiterator", which is a generator that uses beautiful soup, you are not really creating a list of elements. you create an object (generator) that knows how to iterate through a certain number of elements, one at a time, ( next() )

what does it mean.

instead of what you want, and this, say, a book with pages.

you get a typewriter.

a typewriter can create a book with pages, but only 1 page at a time. Now, if you are just starting from the very beginning and looking at them in turn, like a loop, then yes, it is almost like reading a regular book.

but unlike a regular book, as soon as the typewriter ends with a page, you cannot go back, that page has already disappeared.

Hope this makes sense.

+1
source

Source: https://habr.com/ru/post/1447506/


All Articles