Clojure - speed up large file processing

I need to read a large file (~ 1 GB), process it and save it in db. My solution looks like this:

data.txt

format: [id],[title]\n

 1,Foo 2,Bar ... 

code

 (ns test.core (:require [clojure.java.io :as io] [clojure.string :refer [split]])) (defn parse-line [line] (let [values (split line #",")] (zipmap [:id :title] values))) (defn run [] (with-open [reader (io/reader "~/data.txt")] (insert-batch (map parse-line (line-seq reader))))) ; insert-batch just save vector of records into database 

But this code does not work like that, because it parses all the rows first and then sends them to the database.

I think the ideal solution would be read line -> parse line -> collect 1000 parsed lines -> batch insert them into database -> repeat until there is no lines . Unfortunately, I do not know how to implement this.

+6
source share
1 answer

One suggestion:

  • Use line-seq to get a lazy string sequence,

  • use a map to parse each line,

(as long as it matches what you do)

  • use partition-all to split your lazy sequence of parsed lines into batches and then

  • use insert-batch with doseq to write each batch to the database.

And an example:

 (->> (line-seq reader) (map parse-line) (partition-all 1000) (#(doseq [batch %] (insert-batch batch)))) 
+12
source

Source: https://habr.com/ru/post/985726/


All Articles