While I do not know exactly what is causing OOM, I would like to offer some general suggestions and discuss our discussion in detail in the comments.
That way, the sequence will be stored in memory when I use some kind of loop, but not if I call run-processing directly? But in the dose, he clearly stated that "he does not preserve the head of the sequence." Then what should I do when I need to call run-processing several times (for example, with different arguments)?
So our function:
(defn process-file! [conn config name] (with-open [source (io/input-stream (io/file name))] (-> (xml/parse source) ((fn [x] (doseq [i [0]] (run-processing conn config x)))))))
Where x is lazy-seq (if you are using data.xml ), for example:
x <- xml iterator <- file stream
If run-proccessing does everything right (consumes x completely and returns nil ), there is nothing wrong with that - the problem is with the x binding itself. While run-processing works, it fully implements the sequence x is the chapter.
(defn process-xml! [conn config x] (run-processing conn config x) ;; X IS FULLY REALIZED IN MEMORY (run-reporting conn config x)) (defn process-file! [conn config name] (with-open [source (io/input-stream (io/file name))] (->> (xml/parse source) (process-xml! conn config))))
As you can see, we do not consume a file element by element and immediately drop them - all thanks to x . doseq has nothing to do with this: it "does not save the head of the sequence" which it consumes, which in our case [0] .
This approach is not very idiomatic for two reasons:
1. run-processing does too much
He knows where the data comes from, in what form they process and what to do with the data. More typical proccess-file! will look like this:
(defn process-file! [conn config name] (with-open [source (io/input-stream (io/file name))] (->> (xml/parse source) (find-item-nodes) (map node->item) (run! (partial process-item! conn config)))))
This is not always viable and not suitable for each use case, but there is another reason to do it this way.
2. process-file! should (ideally) never give away x anyone
This is immediately apparent from consideration of the source code: with-open . query from clojure.java.jdbc is a good example. What it does is get a ResultSet , map it to pure Clojure data structures and force it to be fully read (using result-set-fn of doall ) to free the connection.
Note that it never loses a ResultSet , and the only option is to get the result of seq ( result-set-fn ), which is a “callback”: query wants to manage the life cycle of the ResultSet and make sure it is closed once the query returns. Otherwise, it is too easy to make a similar mistake.
(But we can, if we pass it a function similar to process-xml! As result-set-fn .)
Comments replies
As I said, I can’t say exactly what exactly causes OOM. It could be:
run-processing . In any case, the JVM is still small, and adding a simple doseq causes OOM. Therefore, I proposed to slightly increase the heap size as a test.
Clojure optimizes x binding.
(fn [x] (run-processing conn config x)) simply a JVM built-in, subsequently fixing the x binding problem.
So, why does handling a dose wrap q do x keep your head up? In my examples, I do not use x more than once (unlike your "run-processing x THEN run-report on SAME x").
The root of the problem is not the reuse of x , but the only fact that x exists. Let make a simple lazy-seq :
(let [x (range 1 1e6)])
(Forget range being implemented as a Java class.)
What is x ? x is the lazy seq command, which is the recipe for building the next value.
x = (recipe)
Let it move forward:
(let [x (range 1 1e6) y (drop 5 x) z (first y)])
Now x , y and y :
x = (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (recipe) y = (6) -> (recipe) z = 6
Hopefully now you can see what I mean: "x is the seq head, and run-processing implements it."
About "process-file! Should (ideally) never give x to anyone" - correct me if I'm wrong, but it doesn’t compare with pure Clojure data structures with doall make them live in memory, which would be bad if the file is too large (as in my case)?
process-file! does not use doall . run! is a decrease and returns nil.