Clojure Leining REPL OutOfMemoryError Java Explicit Space

I am trying to parse a rather small (<100 MB) xml file with

(require '[clojure.data.xml :as xml] '[clojure.java.io :as io]) (xml/parse (io/reader "data/small-sample.xml")) 

and I get an error:

 OutOfMemoryError Java heap space clojure.lang.Numbers.byte_array (Numbers.java:1216) clojure.tools.nrepl.bencode/read-bytes (bencode.clj:101) clojure.tools.nrepl.bencode/read-netstring* (bencode.clj:153) clojure.tools.nrepl.bencode/read-token (bencode.clj:244) clojure.tools.nrepl.bencode/read-bencode (bencode.clj:254) clojure.tools.nrepl.bencode/token-seq/fn--3178 (bencode.clj:295) clojure.core/repeatedly/fn--4705 (core.clj:4642) clojure.lang.LazySeq.sval (LazySeq.java:42) clojure.lang.LazySeq.seq (LazySeq.java:60) clojure.lang.RT.seq (RT.java:484) clojure.core/seq (core.clj:133) clojure.core/take-while/fn--4236 (core.clj:2564) 

Here is my .clj project:

 (defproject dats "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.5.1"] [org.clojure/data.xml "0.0.7"] [criterium "0.4.1"]] :jvm-opts ["-Xmx1g"]) 

I tried setting LEIN_JVM_OPTS and JVM_OPTS in my .bash_profile without success.

When I tried the following project.clj file:

 (defproject barber "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.5.1"] [org.clojure/data.xml "0.0.7"] [criterium "0.4.1"]] :jvm-opts ["-Xms128m"]) 

I get the following error:

 Error occurred during initialization of VM Incompatible minimum and maximum heap sizes specified Exception in thread "Thread-5" clojure.lang.ExceptionInfo: Subprocess failed {:exit-code 1} 

Any idea how I can increase the heap size for my leiningen repl?

Thanks.

+4
source share
2 answers

Any form evaluated at the top level of repl is fully implemented as a result of the Read-Eval-Print-Loop printing phase. It is also stored on the heap, so you can later get it through * 1.

if you save the return value as follows:

(def parsed (xml/parse (io/reader "data/small-sample.xml")))

this immediately returns even for a file of hundreds of megabytes in size (I checked it locally). Then you can iterate over the result, which is fully implemented, since it is parsed from the input stream, iterating over the returned clojure.data.xml.Element tree.

If you are not holding on to the elements (linking them so that they are still accessible), you can iterate through the entire structure without using more bars than are required to store a single xml tree node.

 user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.739795 msecs" #'user/n user> (time (keys n)) "Elapsed time: 0.025683 msecs" (:tag :attrs :content) user> (time (-> n :tag)) "Elapsed time: 0.031224 msecs" :catalog user> (time (-> n :attrs)) "Elapsed time: 0.136522 msecs" {} user> (time (-> n :content first)) "Elapsed time: 0.095145 msecs" #clojure.data.xml.Element{:tag :book, :attrs {:id "bk101"}, :content (#clojure.data.xml.Element{:tag :author, :attrs {}, :content ("Gambardella, Matthew")} #clojure.data.xml.Element{:tag :title, :attrs {}, :content ("XML Developer Guide")} #clojure.data.xml.Element{:tag :genre, :attrs {}, :content ("Computer")} #clojure.data.xml.Element{:tag :price, :attrs {}, :content ("44.95")} #clojure.data.xml.Element{:tag :publish_date, :attrs {}, :content ("2000-10-01")} #clojure.data.xml.Element{:tag :description, :attrs {}, :content ("An in-depth look at creating applications \n with XML.")})} user> (time (-> n :content count)) "Elapsed time: 48178.512106 msecs" 459000 user> (time (-> n :content count)) "Elapsed time: 86.931114 msecs" 459000 ;; redefining n so that we can test the performance without the pre-parsing done when we counted user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml")))) "Elapsed time: 0.702885 msecs" #'user/n user> (time (doseq [el (take 100 (drop 100 (-> n :content)))] (println (:tag el)))) :book :book .... ;; output truncated "Elapsed time: 26.019374 msecs" nil user> 

Please note that only when I first ask to count the contents of n (thus causing the whole file to be parsed) that there is a huge time delay. If I dose by subsection of the structure, this happens very quickly.

+3
source

I don't know so much about lein, but in mvn you can do the following:

 mvn -Dclojure.vmargs="-d64 -Xmx2G" clojure:nrepl 

(I don't think it matters, but I always saw it with Capitol G, case sensitive)

Pulling 100 MB of data into memory should not be a problem. I regularly route GB-based data through my projects.

I always use a 64-bit server for large heaps, and this is similar to what they do here:

JVM parameters using Leiningen

I think the big problem is that, as you wrote it, this can be evaluated at compile time. You need to wrap this call in a function and defer execution. I think the compiler is trying to read this file, and this is most likely not what you want. I know that with mvn you get different memory settings to compile and run, and you can get that too.

+2
source

Source: https://habr.com/ru/post/1495661/


All Articles