Pyspark 'PipelinedRDD' does not have the 'show' attribute

Question

Pyspark 'PipelinedRDD' does not have the 'show' attribute

I want to know that all elements in df that are not in df1 are also elements in df1 but not in df

df =sc.parallelize([1,2,3,4 ,5 ,6,7,8,9]) df1=sc.parallelize([4 ,5 ,6,7,8,9,10]) df2 = df.subtract(df1) df2.show() df3 = df1.subtract(df) df3.show()

I just want to check the result to see if I understand this function well. But got this error. PipelinedRDD object does not have the 'show' attribute any suggestion?

+5

attributes pyspark

newleaf Dec 15 '16 at 0:56

source share

2 answers

Zhang Tong · Answer 1 · 2016-12-15T02:51:58+0000

 print(df2.take(10))

df.show() is for a DataFrame spark only

Marco visibelli · Answer 2 · 2017-04-20T13:20:56+0000

He prints that it is an RDD, and that the type is a PipelinedRDD, and not a list of values, as we might expect. Due to the fact that we have not yet performed any action, we performed only the conversion.

Try to do it

 df3.count()

then you can do

 df3.show()

Pyspark 'PipelinedRDD' does not have the 'show' attribute

More articles: