I wrote a Scala application (2.9.1-1) that needs to process several million rows from a database query. I convert ResultSet
to Stream
using the technique shown in the answer to one of my previous questions :
class Record(...) val resultSet = statement.executeQuery(...) new Iterator[Record] { def hasNext = resultSet.next() def next = new Record(resultSet.getString(1), resultSet.getInt(2), ...) }.toStream.foreach { record => ... }
and it worked very well.
Since the body of the foreach
closure is very intense for the processor, and as a proof of the practicality of functional programming, if I add .par
to the foreach
, the closures will run in parallel, without any others, except to make sure that the cover body is thread safe (it written in a functional style without modifiable data, except for printing in a thread-safe journal).
However, I am worried about memory consumption. Is .par
reason that the entire result set is loaded into RAM, or is the parallel operation loading only as many lines as there are active threads? I have allocated 4G for the JVM (64-bit with -Xmx4g
), but in the future I will run it on even more lines and worry about what I will end up with from the memory.
Is there a better way for this kind of parallel processing in a functional manner? I showed this application to my employees as an example of the value of functional programming and multi-core machines.
Ralph source share