Scala Infinite Iterator OutOfMemory

I play with Scala lazy iterators and I ran into a problem. What I'm trying to do is read in a large file, performs the conversion, and then writes the result:

object FileProcessor { def main(args: Array[String]) { val inSource = Source.fromFile("in.txt") val outSource = new PrintWriter("out.txt") try { // this "basic" lazy iterator works fine // val iterator = inSource.getLines // ...but this one, which incorporates my process method, // throws OutOfMemoryExceptions val iterator = process(inSource.getLines.toSeq).iterator while(iterator.hasNext) outSource.println(iterator.next) } finally { inSource.close() outSource.close() } } // processing in this case just means upper-cases every line private def process(contents: Seq[String]) = contents.map(_.toUpperCase) } 

So, I get an OutOfMemoryException for large files. I know that you may encounter lazy Scala streams if you stick to the links in the Stream chapter. Therefore, in this case, I try to convert the result of the process () to an iterator and throw away the Seq that it initially returns.

Does anyone know why this still causes O (n) memory consumption? Thanks!

<h / "> Update

In response to fge and huynhjl, it seems that Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I use Seq everywhere). This code does not throw an OutOfMemoryException:

 object FileReader { def main(args: Array[String]) { val inSource = Source.fromFile("in.txt") val outSource = new PrintWriter("out.txt") try { writeToFile(outSource, process(inSource.getLines.toSeq)) } finally { inSource.close() outSource.close() } } @scala.annotation.tailrec private def writeToFile(outSource: PrintWriter, contents: Seq[String]) { if (! contents.isEmpty) { outSource.println(contents.head) writeToFile(outSource, contents.tail) } } private def process(contents: Seq[String]) = contents.map(_.toUpperCase) 
+4
source share
1 answer

As scheduled by fge, modify the process to take an iterator and remove .toSeq . inSource.getLines already an iterator.

Converting to Seq will lead to memorization of elements. I think it will convert the iterator to Stream and make all elements memorized.

Edit: Alright, this is more subtle. You execute the equivalent of Iterator.toSeq.iterator , calling iterator on the result of the process. This may cause a memory exception.

 scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size java.lang.OutOfMemoryError: Java heap space 

This may be the same problem as here: https://issues.scala-lang.org/browse/SI-4835 . Pay attention to my comment at the end of the error, this is from personal experience.

+6
source

Source: https://habr.com/ru/post/908640/


All Articles