I am using a mesos cluster to deploy a spark job (client mode). I have three servers and I was able to launch a spark. However, after some time (several days) I received an error message:
5/11/03 19:55:50 ERROR Executor: Managed memory leak detected; size = 33554432 bytes, TID = 387939
15/11/03 19:55:50 ERROR Executor: Exception in task 2.1 in stage 6534.0 (TID 387939)
java.io.FileNotFoundException: /tmp/blockmgr-3acec504-4a55-4aa8-a3e5-dda97ce5d055/03/temp_shuffle_cb37f147-c055-4014-a6ae-fd505cb49f57 (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Now it forces all streaming parties to be queued and is displayed as “processing” (4042 / streaming /). None of them could continue until I manually restarted the spark task and resubmitted it again.
My spark job just reads the data from kafka and does some updating in mongo (there are quite a few update requests, but I set the spark stream to about 5 minutes, so it shouldn't cause a problem).
, ; spark-kafka :
ERROR Executor: Exception in task 5.3 in stage 7561.0 (TID 392220)
org.apache.spark.SparkException: Couldn't connect to leader for topic bid_inventory 9: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator$$anonfun$connectLeader$1.apply(KafkaRDD.scala:164)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator$$anonfun$connectLeader$1.apply(KafkaRDD.scala:164)
.
- , ? .