Why does the Akka application fail with an out-of-memory error when performing an NLP task?

I notice that my program has a strong memory leak (spiral memory consumption up). I had to parallelize this NLP task (using StanfordNLP EnglishPCFG Parser and Tregex Matcher). Therefore, I built an actor pipeline (a total of 6 actors for each task):

   val listOfTregexActors = (0 to 5).map(m => system.actorOf(Props(new TregexActor(timer, filePrinter)), "TregexActor" + m)) 
   val listOfParsers = (0 to 5).map(n => system.actorOf(Props(new ParserActor(timer, listOfTregexActors(n), lp)), "ParserActor" + n)) 
   val listOfSentenceSplitters  = (0 to 5).map(j => system.actorOf(Props(new SentenceSplitterActor(listOfParsers(j), timer)), "SplitActor" + j)) 

My actors are pretty standard. They need to stay alive in order to process all the information (there is no way of poison). Memory consumption is growing and increasing, and I have no idea what happened. If I work with a single thread, memory consumption will be just fine. I read somewhere that if the actors do not die, nothing inside will be released. Should I manually release things?

There are two heavy actors:

https://github.com/windweller/parallelAkka/blob/master/src/main/scala/blogParallel/ParserActor.scala https://github.com/windweller/parallelAkka/blob/master/src/main/scala/blogParallel /TregexActor.scala

I wonder if this could be a Scala closure or another mechanism that stores too much information, and the GC cannot somehow collect it.

Here is the TregexActor part:

def receive = {
    case Match(rows, sen) =>
      println("Entering Pattern matching: " + rows(0))
      val result = patternSearching(sen)
      filePrinter ! Print(rows :+ sen.toString, result)
  }

  def patternSearching(tree: Tree):List[Array[Int]] = {
    val statsFuture = search(patternFuture, tree)
    val statsPast = search(patternsPast, tree)

    List(statsFuture, statsPast)
  }

  def search(patterns: List[String], tree: Tree) = {
    val stats =  Array.fill[Int](patterns.size)(0)

    for (i <- 0 to patterns.size - 1) {
      val searchPattern = TregexPattern.compile(patterns(i))
      val matcher = searchPattern.matcher(tree)
      if (matcher.find()) {
        stats(i) = stats(i) + 1
      }
      timer ! PatternAddOne
    }
    stats
  }

Or, if my code is checked, can it be a StanfordNLP parser or a memory leak from a triangle counter? Is there a strategy for manually freeing memory, or do I need to kill those actors after a while and assign their mailbox tasks to a new actor to release the memory? (If so, how?)

enter image description here


After some struggle with the profiling tools, I was finally able to use VisualVM with IntelliJ. Here are the pictures. GC never started.

enter image description here

:

enter image description here


:

→ SentenceSplit (6) → (6) → - (6) → ()

Entry.scala: https://github.com/windweller/parallelAkka/blob/master/src/main/scala/blogParallel/Entry.scala

+4
2

, , .

, Companion.

  val listOfTregexActors = (0 to 5).map(m => system.actorOf(Props(new TregexActor(timer, filePrinter)), "TregexActor" + m))
  val listOfParsers = (0 to 5).map(n => system.actorOf(Props(new ParserActor(timer, listOfTregexActors(n), lp)), "ParserActor" + n))
  val listOfSentenceSplitters = (0 to 5).map(j => system.actorOf(Props(new SentenceSplitterActor(listOfParsers(j), timer)), "SplitActor" + j))

new .

, App, GC .

, , .

, , max. , JVM?

. " "

, :

+1

:

 val cleanedSentences = new java.util.ArrayList[java.util.List[HasWord]]()

- ? , , . , (, cleanSentence).

UPDATE: , ( ), . (, Redis), " " , DB.

, (, java.util.List), , , , , , .

. cleanSentence :

def cleanSentence(sentences: List[HasWord]): List[HasWord] = {
    import TwitterRegex._

    sentences.filter(ref =>
        val word = ref.word() // do not call same function several times
        !word.contains("#") && 
        !word.contains("@") &&  
        !word.matches(searchPattern.toString())
      )
  }

java.util.List Scala ( ) :

import scala.collection.JavaConverters._

val javaList:java.util.List[HasWord] = ...
javaList.asScala
0

Source: https://habr.com/ru/post/1569199/


All Articles