How to run python script in spark operation?

I have a spark setup on 3 machines using a tar file. I don’t have any pre-configuration, I edited the slaves file and started working with wizards and workers. I can see sparkUI on port 8080. Now I want to run a simple python script on a spark cluster.

import sys from random import random from operator import add from pyspark import SparkContext if __name__ == "__main__": """ Usage: pi [partitions] """ sc = SparkContext(appName="PythonPi") partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2 n = 100000 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 < 1 else 0 count = sc.parallelize(xrange(1, n + 1), partitions).map(f).reduce(add) print "Pi is roughly %f" % (4.0 * count / n) sc.stop() 

I run this command

spark-submit --master spark: // IP: 7077 pi.py 1

But getting the following error

 14/12/22 18:31:23 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 14/12/22 18:31:38 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/12/22 18:31:43 INFO client.AppClient$ClientActor: Connecting to master spark://10.77.36.243:7077... 14/12/22 18:31:53 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/12/22 18:32:03 INFO client.AppClient$ClientActor: Connecting to master spark://10.77.36.243:7077... 14/12/22 18:32:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/12/22 18:32:23 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/12/22 18:32:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/12/22 18:32:23 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 14/12/22 18:32:23 INFO scheduler.DAGScheduler: Failed to run reduce at /opt/pi.py:21 Traceback (most recent call last): File "/opt/pi.py", line 21, in <module> count = sc.parallelize(xrange(1, n + 1), partitions).map(f).reduce(add) File "/usr/local/spark/python/pyspark/rdd.py", line 759, in reduce vals = self.mapPartitions(func).collect() File "/usr/local/spark/python/pyspark/rdd.py", line 723, in collect bytesInJava = self._jrdd.collect().iterator() File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/usr/local/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o26.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 

Someone is facing the same problem. Plz will help with this.

+6
source share
1 answer

It:

 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 

assumes the cluster does not have available resources.

Check the status of your cluster and check the kernels and RAM ( http://www.datastax.com/dev/blog/common-spark-troubleshooting ).

Also, double check your IP addresses.

Additional ideas: Running a task on Spark 0.9.0 throws error

+3
source

Source: https://habr.com/ru/post/980039/


All Articles