(null) in command line exception in saveAsTextFile () on Pyspark

I work in PySpark on a Jupyter laptop (Python 2.7) on Windows 7. I have an RDD type pyspark.rdd.PipelinedRDDcalled idSums. When I try to execute, idSums.saveAsTextFile("Output")I get the following error:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

There should not be any problems with the RDD object, in my opinion, because I can perform other actions without errors, for example. execution idSums.collect()creates the correct output.

In addition, a directory is created Output(with all subdirectories) and a file is created part-00001, but it is 0 bytes.

+4
source share
1 answer

winutils.exe hadoop. x64 /x 32 . winutils.exe .

:

1.

2. hadoop ex "C:"

3. bin adoop ex: C:\hadoop\bin

4.paste winutils.exe bin ex: C:\hadoop\bin\winuitls.exe

5. →

: HADOOP_HOME : C:\Hadoop\

:

hadoop Java , :

System.setProperty( "hadoop.home.dir", "C:\hadoop" );

+11

Source: https://habr.com/ru/post/1661704/


All Articles