I am wondering if scala -native can be used to perform large jobs in memory.
For example, imagine that you have a spark job that needs 150 GB of RAM, so you have to run 5x30GB performers in the spark cluster, as JVM garbage collectors will not catch up with a heap larger than this.
Imagine 99% of the data being processed Stringsin collections.
Do you think scala -native will help here? I mean, how is an alternative to Spark?
How does he relate to String? Does it also have this overhead because jvm treats it as a class?
What are the memory limits (heaps) of the GC like the classic 30 GB in the case of the JVM? Will I also have a limit like 30 GB?
Or is this a bad idea? Use scala -native to process data in memory. I guess scala-offheap is the best way to go.
lisak source
share