I have big data called ribs
org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(String, Int)]] = MappedRDD[27] at map at <console>:52
When I was working offline, I was able to collect, count and save this file. Now, in the cluster, I get this error
edges.count ... Serialized task 28:0 was 12519797 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.
Same thing with .saveAsTextFile ("edge")
This is from a spark shell. I tried using the parameter
--driver-java-options "-Dspark.akka.frameSize = 15"
But when I do this, it just hangs endlessly. Any help would be greatly appreciated.
** EDIT **
My offline mode was on Spark 1.1.0, and my cluster was Spark 1.0.1.
In addition, a hang occurs when I go to counting, collecting, or saving. * RDD, but defining it or creating filters on it works fine.
source share