Creating a lazy sequence from I / O data

I'm trying to capture Clojure. As an exercise, I decided to build a function that returns a lazy sequence of given subreddit entries.

To make my goal clear, I have compiled the following Ruby code that does just that using Lazy Enumerators.

require 'open-uri' require 'nokogiri' class Reddit def initialize(subbredit) @url = "http://www.reddit.com/r/" + subbredit.downcase @entries = [] end def entries Enumerator::Lazy.new(1..Float::INFINITY) do |yielder| if @entries.empty? parse else yielder << @entries.shift end end end def reset @url.gsub!(/\?.*/, '') @entries = [] end private def parse page = Nokogiri::HTML(open(@url)) @url = page.css('p.nextprev a[rel="nofollow next"]').first['href'] page.css('div.thing').each do |thing| title = thing.css('a.title').text points = thing.css('div.score.unvoted').text.to_i @entries << { :title => title, :points => points } end end end 

(I also welcome comments on the Ruby code. But keep in mind that I'm interested in lazy sequences, not an object-oriented pattern.)

Arriving at Clojure, after much effort and indestructible curses, I ended up with the following code.

 (ns playground.experiments.lazy-html (:require [net.cgrand.enlive-html :as html])) (defn subreddit-url [name] (str "http://www.reddit.com/r/" name)) (defn fetch-page [url] (html/html-resource (java.net.URL. url))) (defn make-integer [n] (try (Integer. n) (catch Exception e 0))) (defn page-entries [url] (let [page (fetch-page url) things (html/select page [:div.thing])] (map #(hash-map :title (-> % (html/select [:a.title]) first html/text) :score (-> % (html/select [:div.score.unvoted]) first html/text make-integer)) things))) (defn next-url [url] (let [page (fetch-page url)] (-> page (html/select [:p.nextprev (html/attr-has :rel "next")]) first :attrs :href))) (defn entries [url] (lazy-cat (page-entries url) (entries (next-url url)))) (defn subreddit [name] (-> name subreddit-url entries)) 

(Comments, suggestions for criticism and improvement on all aspects of the code are looking forward. I sent a gist to anyone who would like to mess with the code.)

The thing works ... to some extent. He obviously has a huge problem : recursion in entries does not occur at the tail position. This means that if I were ready to interrogate tens of thousands of pages - well, of course, not from reddit - the stack would hit right away, wouldn't it?

I could not find a way to create optimization tail recursive lazy sequences. I have read most of Clojure's SO threads on lazy sequences, but to no avail. I guess I'm missing a point somewhere. Below are two of my silly attempts, one of which seems to not even make sense to the Clojure compiler, and the other is endless.

 (defn subreddit [name] (loop [url (subreddit-url name)] (lazy-seq (concat (page-entries url) (recur (next-url url)))))) (defn subreddit [name] (loop [url (subreddit-url name) old-entries []] (recur (next-url url) (lazy-cat (page-entries url) old-entries)))) 

Question : How do I do this? How to build lazy sequences from pieces of I / O data in Clojure? Is it possible that lazy sequences are not the right tool here? (In Ruby, laziness should be - saving memory). Or does LazySeq ressort for some kind of optimization magic (caching + stack smoothing?) So that the first piece of code above is safe for stackoverflow?

A side question . The Ruby code above has state, which means you can use part of the infinite sequence in the first call, and then get the next snippet with the second call. How can you achieve something like this in Clojure? I tried closing, alas unsuccessfully.

nota bene I am a complete newbie to Clojure. I started with Joy Clojure , a very pleasant, dense, clearly written and insightful reading. But the piece on lazy things, for example, shrank a little. What is Kladouria advising to get a good grip on Clojure?

+4
source share
2 answers

I would like to add an answer to Alex, emphasizing what I consider to be the key point:

Functions that create lazy sequences should not be tail recursive.

The reason is because a function like

 (defn foo [& args] (lazy-seq ...)) ; ^- could be lazy-cat etc.; ; ... typically has (cons ... (foo)) in tail position 

immediately returns the lazy seq object to the caller, and therefore its stack stack is immediately removed from the stack.

If the caller โ€” or someone else to whom the lazy seq was passed โ€” then requests the actual items from the lazy seq, they will be called by calling the function stored inside the lazy seq object whose body matches to ... in the fragment above. If the body (something functionally similar) (cons ... (foo)) , then the recursive call will behave like an external call, i.e. Immediately return the lazy seq object and pop its frame from the stack. Then, when this part of seq is really needed, the process will repeat, etc. Etc.

Note that this means that someone who implements the lazy seq created by foo gets the return value of cons , which can be obtained immediately because the internal call to foo itself is returned immediately because it returns the lazy segment.

In contrast, if foo was tail recursive, it could not be lazy, or, in other words, its participation in constructing foo seq would have to end by the time it returns the value to the caller - so either it would have to produce all seq to return, or to delegate the "lazy work" of another function (for which the argument can be repeated).

One way to think about it is that lazy seq producers confirm the management structure of the seq production process on the heap, while the seq-tail recursive producer accumulates the results in variables stored on the stack (regardless of whether it really grows like this it would be on the JVM, or itโ€™s not like on the Scheme).

See also How lazy sequences work in Clojure? for a few answers, going into the details behind the lazy sector. (My own answer contains a shorter earlier attempt to summarize what I did above).

+2
source

Actually there is nothing that I could see wrong with your initial implementation of the entries above. This will not delete the stack, as you fear - the key to understanding why you should remember that lazy-cat (and the lazy-seq it is written on) are macros . As a newbie to Clojure, you don't need to write a macro, but you need to understand a little how they work.

A macro in Clojure is a piece of code that basically intercepts the compiler to rewrite code before evaluating it. Among other things, macros are useful for deferred code evaluation. In this case, the recursive call to entries not immediately evaluated when the function is called; rather, the lazy-cat macro completes the recursive call into an anonymous, non-arg function (often called "thunk"), and this function is saved and evaluated only when the received seq is read. Another way to think about it: you tell lazy-seq how to create a sequence, and she remembers what she is waiting for you to create a sequence until she needs it.

A useful way to verify that a sequence is created lazily is to add a call (println) before let in the page-entries function. Then from REPL:

 > (def x (subreddit "foo")) ; You should see no printlns here > (take 10 x) ; Now you should see printlns as the sequence is realized 
+2
source

Source: https://habr.com/ru/post/1479356/


All Articles