I'm trying to capture Clojure. As an exercise, I decided to build a function that returns a lazy sequence of given subreddit entries.
To make my goal clear, I have compiled the following Ruby code that does just that using Lazy Enumerators.
require 'open-uri' require 'nokogiri' class Reddit def initialize(subbredit) @url = "http://www.reddit.com/r/" + subbredit.downcase @entries = [] end def entries Enumerator::Lazy.new(1..Float::INFINITY) do |yielder| if @entries.empty? parse else yielder << @entries.shift end end end def reset @url.gsub!(/\?.*/, '') @entries = [] end private def parse page = Nokogiri::HTML(open(@url)) @url = page.css('p.nextprev a[rel="nofollow next"]').first['href'] page.css('div.thing').each do |thing| title = thing.css('a.title').text points = thing.css('div.score.unvoted').text.to_i @entries << { :title => title, :points => points } end end end
(I also welcome comments on the Ruby code. But keep in mind that I'm interested in lazy sequences, not an object-oriented pattern.)
Arriving at Clojure, after much effort and indestructible curses, I ended up with the following code.
(ns playground.experiments.lazy-html (:require [net.cgrand.enlive-html :as html])) (defn subreddit-url [name] (str "http://www.reddit.com/r/" name)) (defn fetch-page [url] (html/html-resource (java.net.URL. url))) (defn make-integer [n] (try (Integer. n) (catch Exception e 0))) (defn page-entries [url] (let [page (fetch-page url) things (html/select page [:div.thing])] (map #(hash-map :title (-> % (html/select [:a.title]) first html/text) :score (-> % (html/select [:div.score.unvoted]) first html/text make-integer)) things))) (defn next-url [url] (let [page (fetch-page url)] (-> page (html/select [:p.nextprev (html/attr-has :rel "next")]) first :attrs :href))) (defn entries [url] (lazy-cat (page-entries url) (entries (next-url url)))) (defn subreddit [name] (-> name subreddit-url entries))
(Comments, suggestions for criticism and improvement on all aspects of the code are looking forward. I sent a gist to anyone who would like to mess with the code.)
The thing works ... to some extent. He obviously has a huge problem : recursion in entries does not occur at the tail position. This means that if I were ready to interrogate tens of thousands of pages - well, of course, not from reddit - the stack would hit right away, wouldn't it?
I could not find a way to create optimization tail recursive lazy sequences. I have read most of Clojure's SO threads on lazy sequences, but to no avail. I guess I'm missing a point somewhere. Below are two of my silly attempts, one of which seems to not even make sense to the Clojure compiler, and the other is endless.
(defn subreddit [name] (loop [url (subreddit-url name)] (lazy-seq (concat (page-entries url) (recur (next-url url)))))) (defn subreddit [name] (loop [url (subreddit-url name) old-entries []] (recur (next-url url) (lazy-cat (page-entries url) old-entries))))
Question : How do I do this? How to build lazy sequences from pieces of I / O data in Clojure? Is it possible that lazy sequences are not the right tool here? (In Ruby, laziness should be - saving memory). Or does LazySeq ressort for some kind of optimization magic (caching + stack smoothing?) So that the first piece of code above is safe for stackoverflow?
A side question . The Ruby code above has state, which means you can use part of the infinite sequence in the first call, and then get the next snippet with the second call. How can you achieve something like this in Clojure? I tried closing, alas unsuccessfully.
nota bene I am a complete newbie to Clojure. I started with Joy Clojure , a very pleasant, dense, clearly written and insightful reading. But the piece on lazy things, for example, shrank a little. What is Kladouria advising to get a good grip on Clojure?