I have a piece of code when I create a map like:
val map = gtfLineArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap
Then I use this map to create my object:
case class MyObject(val attribute1: String, val attribute2: Map[String:String])
I read millions of lines and convert to MyObjects using an iterator. how
MyObject("1", map)
When I do it very slowly, more than 1 hour in 2,000,000 entries.
I remove the map from the creation of the object, but still I am doing the split process (section 1):
val map = gtfLineArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap
MyObject("1", null)
And the script process runs in less than 1 minute. for 2'000'000 million records.
I did some profiling and it looks like this: when an object is created, assigning between val mapto the object map makes the process slow. What am I missing?
Update to better explain the problem:
If you see my code to explain that I iterate over 2,000,000 lines, converting each line to an internal object, for iteration I do:
it.map(cretateNewObject).toList
createNewObject.
, , dk14.
`crateNewObject(val line:String)`
`class MyObject(val attribute1:String, val attribute2:Map[String, String])`
`val attributeArr = line.split("\t")`
- 1 ,
`val map = attributeArr(8).split(";").map(_ split "\"").collect { case Array(k, v) => (k, v) }.toMap`
, 2 , MyObject(attribute1, map), .