Any form evaluated at the top level of repl is fully implemented as a result of the Read-Eval-Print-Loop printing phase. It is also stored on the heap, so you can later get it through * 1.
if you save the return value as follows:
(def parsed (xml/parse (io/reader "data/small-sample.xml")))
this immediately returns even for a file of hundreds of megabytes in size (I checked it locally). Then you can iterate over the result, which is fully implemented, since it is parsed from the input stream, iterating over the returned clojure.data.xml.Element tree.
If you are not holding on to the elements (linking them so that they are still accessible), you can iterate through the entire structure without using more bars than are required to store a single xml tree node.
user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.739795 msecs" #'user/n user> (time (keys n)) "Elapsed time: 0.025683 msecs" (:tag :attrs :content) user> (time (-> n :tag)) "Elapsed time: 0.031224 msecs" :catalog user> (time (-> n :attrs)) "Elapsed time: 0.136522 msecs" {} user> (time (-> n :content first)) "Elapsed time: 0.095145 msecs" #clojure.data.xml.Element{:tag :book, :attrs {:id "bk101"}, :content (#clojure.data.xml.Element{:tag :author, :attrs {}, :content ("Gambardella, Matthew")} #clojure.data.xml.Element{:tag :title, :attrs {}, :content ("XML Developer Guide")} #clojure.data.xml.Element{:tag :genre, :attrs {}, :content ("Computer")} #clojure.data.xml.Element{:tag :price, :attrs {}, :content ("44.95")} #clojure.data.xml.Element{:tag :publish_date, :attrs {}, :content ("2000-10-01")} #clojure.data.xml.Element{:tag :description, :attrs {}, :content ("An in-depth look at creating applications \n with XML.")})} user> (time (-> n :content count)) "Elapsed time: 48178.512106 msecs" 459000 user> (time (-> n :content count)) "Elapsed time: 86.931114 msecs" 459000 ;; redefining n so that we can test the performance without the pre-parsing done when we counted user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.702885 msecs" #'user/n user> (time (doseq [el (take 100 (drop 100 (-> n :content)))] (println (:tag el)))) :book :book .... ;; output truncated "Elapsed time: 26.019374 msecs" nil user>
Please note that only when I first ask to count the contents of n (thus causing the whole file to be parsed) that there is a huge time delay. If I dose by subsection of the structure, this happens very quickly.
source share