How to determine the amount of usable memory performers

Question

How to determine the amount of usable memory performers

Spark artists use some of the memory for caching ( spark.storage.memoryFraction ) and some memory for shuffling ( spark.shuffle.memoryFraction ). The rest is available for use by application code, for example, which is executed in the RDD.map operation.

I would like to know the amount of this useful memory. (I want to have large partitions that still fit in memory. I want to divide the size of the data into usable memory for each partition to get the number of partitions.)

This is how I calculate it:

 val numExecutors = sc.getExecutorStorageStatus.size - 1 // Exclude driver. val totalCores = numExecutors * numCoresPerExecutor val cacheMemory = sc.getExecutorMemoryStatus.values.map(_._1).sum val conf = sc.getConf val cacheFraction = conf.getDouble("spark.storage.memoryFraction", 0.6) val shuffleFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2) val workFraction = 1.0 - cacheFraction - shuffleFraction val workMemory = workFraction * cacheMemory / cacheFraction val workMemoryPerCore = workMemory / totalCores

I am sure you will agree that this is terrible. Worst of all, if the default changes to Spark, my result will be wrong. But the default values are hardcoded in Spark. I have no way to get to them.

Is there a better way to get workMemoryPerCore ?

+6

apache-spark

Daniel Darabos Mar 11 '15 at 12:12

source share

1 answer

Gillespie · Answer 1 · 2015-03-25T09:15:44+0000

As I understand it, the maximum memory available for the application is determined (using the default values) as follows:

  private def getMaxMemory(conf: SparkConf): Long = { val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6) val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9) (Runtime.getRuntime.maxMemory * memoryFraction * safetyFraction).toLong }

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1227

This is how I previously calculated the application memory - albeit not particularly strictly:

For example, if spark.executor.memory = 4g

 4 x 0.6 x 0.9 = 2.16g

How to determine the amount of usable memory performers

More articles: