Clojure read large text file and number of instances

I am trying to read a large text file and count the occurrences of certain errors. For example, for the following sample text

something bla error123 foo test error123 line junk error55 more stuff 

I want to finish with (it doesn't really matter what data structure, although I think about the map)

 error123 - 2 error55 - 1 

Here is what I have tried so far

 (require '[clojure.java.io :as io]) (defn find-error [line] (if (re-find #"error" line) line)) (defn read-big-file [func, filename] (with-open [rdr (io/reader filename)] (doall (map func (line-seq rdr))))) 

calling it as follows

  (read-big-file find-error "sample.txt") 

returns:

 (nil nil "error123" nil nil "error123" nil nil "error55" nil nil) 

Next, I tried to remove nil and group values ​​as elements

 (group-by identity (remove #(= nil %) (read-big-file find-error "sample.txt"))) 

which returns

 {"error123" ["error123" "error123"], "error55" ["error55"]} 

This is approaching the desired result, although it may be ineffective. How can I get an invoice now? Also, as someone new to clojure and functional programming, I would appreciate any suggestions on how I can improve this. thanks!

+4
source share
1 answer

I think you can look for a frequency function:

 user=> (doc frequencies) ------------------------- clojure.core/frequencies ([coll]) Returns a map from distinct items in coll to the number of times they appear. nil 

So this should give you what you want:

 (frequencies (remove nil? (read-big-file find-error "sample.txt"))) ;;=> {"error123" 2, "error55" 1} 

If your text file is really large, I would recommend doing it on the built-in line-seq to make sure you have insufficient memory. That way you can also use filter , not map and remove .

 (defn count-lines [pred, filename] (with-open [rdr (io/reader filename)] (frequencies (filter pred (line-seq rdr))))) (defn is-error-line? [line] (re-find #"error" line)) (count-lines is-error-line? "sample.txt") ;; => {"error123" 2, "error55" 1} 
+7
source

Source: https://habr.com/ru/post/1487672/


All Articles