Sparking error: superior to spark.akka.frameSize Consider using translation

Question

Sparking error: superior to spark.akka.frameSize Consider using translation

I have big data called ribs

org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(String, Int)]] = MappedRDD[27] at map at <console>:52

When I was working offline, I was able to collect, count and save this file. Now, in the cluster, I get this error

 edges.count ... Serialized task 28:0 was 12519797 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.

Same thing with .saveAsTextFile ("edge")

This is from a spark shell. I tried using the parameter
--driver-java-options "-Dspark.akka.frameSize = 15"

But when I do this, it just hangs endlessly. Any help would be greatly appreciated.

** EDIT **

My offline mode was on Spark 1.1.0, and my cluster was Spark 1.0.1.

In addition, a hang occurs when I go to counting, collecting, or saving. * RDD, but defining it or creating filters on it works fine.

+5

scala apache-spark rdd

Brian dolan Nov 30 '14 at 21:30

source share

1 answer

Josh rosen · Accepted Answer · 2014-12-01T08:39:55+0000

The error message “Consider using broadcast variables for large values” usually indicates that you have captured some large variables in function closures. For example, you could write something like

 val someBigObject = ... rdd.mapPartitions { x => doSomething(someBigObject, x) }.count()

which causes someBigObject captured and serialized with your task. If you do something similar, you can use a broadcast variable instead, which will cause the object to be stored in the task itself, while the actual data of the object will be sent separately.

In Spark 1.1.0+, there is no need to use broadcast variables for this, since tasks will be automatically translated (see SPARK-2521 for more information. Details). There are still reasons to use broadcast variables (for example, sharing a large object across several actions / tasks), but you do not need to use it to avoid frame size errors.

Another option is to increase the frame size of Akka. In any version of Spark, you must set the spark.akka.frameSize parameter to SparkConf before creating the SparkContext. As you may have noticed, this is a bit more complicated in spark-shell , where the context is created for you. In new versions of Spark (1.1.0 and higher), you can pass --conf spark.akka.frameSize=16 when you run spark-shell . In Spark 1.0.1 or 1.0.2, you can instead pass --driver-java-options "-Dspark.akka.frameSize=16" .

Sparking error: superior to spark.akka.frameSize Consider using translation

More articles: