Scala immutable card slowly

I have a piece of code when I create a map like:

 val map = gtfLineArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap

Then I use this map to create my object:

case class MyObject(val attribute1: String, val attribute2: Map[String:String]) 

I read millions of lines and convert to MyObjects using an iterator. how

MyObject("1", map)

When I do it very slowly, more than 1 hour in 2,000,000 entries.

I remove the map from the creation of the object, but still I am doing the split process (section 1):

val map = gtfLineArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap
MyObject("1", null)

And the script process runs in less than 1 minute. for 2'000'000 million records.

I did some profiling and it looks like this: when an object is created, assigning between val mapto the object map makes the process slow. What am I missing?

Update to better explain the problem:

If you see my code to explain that I iterate over 2,000,000 lines, converting each line to an internal object, for iteration I do:

it.map(cretateNewObject).toList

createNewObject.

, , dk14.

`crateNewObject(val line:String)` 

`class MyObject(val attribute1:String, val attribute2:Map[String, String])` 

`val attributeArr = line.split("\t")` 

- 1 ,

`val map = attributeArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap` 

, 2 , MyObject(attribute1, map), .

+4
1

(0 to 2000000).toList (0 to 2000000).map(x => x -> x).toMap , ( -Xmx4G - 4 ). toMap , "" / "". , GC overactive.

(0 to 2000000).toList 128 - , (0 to 2000000).map(x => x -> x).toMap 2 10% - GC (VisualVM) .

, -Xmx4G, .


P.S. , toMap , , (Array.copy) : https://github.com/scala/scala/blob/99a82be91cbb85239f70508f6695c6b21fd3558c/src/library/scala/collection/immutable/HashMap.scala#L321.

, toMap (2000000 ) updated0, Array.copy, , ( ) GC MarkAndSweep ( ) ( jconsole).


: (-Xmx/-Xms JVM-), , - Apache Spark ( - ) .

+3

Source: https://habr.com/ru/post/1651873/


All Articles