NumPy exception when using MLlib, even if Numpy is installed

Here is the code I'm trying to execute:

from pyspark.mllib.recommendation import ALS iterations=5 lambdaALS=0.1 seed=5L rank=8 model=ALS.train(trainingRDD,rank,iterations, lambda_=lambdaALS, seed=seed) 

When I run the command model=ALS.train(trainingRDD,rank,iterations, lambda_=lambdaALS, seed=seed) , which depends on numpy, the Py4Java library that uses Spark gives the following message:

 Py4JJavaError: An error occurred while calling o587.trainALSModel. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 67.0 failed 4 times, most recent failure: Lost task 0.3 in stage 67.0 (TID 195, 192.168.161.55): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/home/platform/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/home/platform/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 421, in loads return pickle.loads(obj) File "/home/platform/spark/python/lib/pyspark.zip/pyspark/mllib/__init__.py", line 27, in <module> Exception: MLlib requires NumPy 1.4+ 

NumPy 1.10 is installed on the machine indicated in the error message. In addition, I get version 1.9.2 when I run the following command directly in my Jupyter laptop: import numpy numpy.version.version

I obviously use a version of NumPy older than 1.4, but I don't know where. How can I find out on which machine I need to upgrade my version of NumPy?

+5
source share
2 answers

This is an error in the mllib initialization code.

 import numpy if numpy.version.version < '1.4': raise Exception("MLlib requires NumPy 1.4+") 

'1.10' from '1.4' You can use NumPy 1.9.2.

If you need to use NumPy 1.10 and do not want to upgrade to spark 1.5.1. Do a manual code update. https://github.com/apache/spark/blob/master/python/pyspark/mllib/ init .py

+15
source

It looks like you have two versions of numpy installed, and pyspark imports the older one. To confirm this, you can do the following:

 import numpy print numpy.__version__ print numpy.__path__ 

This will probably give you 1.9.2 and its path. Now do the following:

 import pyspark print pyspark.numpy.__version__ print pyspark.numpy.__path__ 

Is loading another numpy from another path? If so, removing it should most likely solve the problem.

0
source

Source: https://habr.com/ru/post/1233378/


All Articles