Pyspark 'PipelinedRDD' does not have the 'show' attribute

I want to know that all elements in df that are not in df1 are also elements in df1 but not in df

df =sc.parallelize([1,2,3,4 ,5 ,6,7,8,9]) df1=sc.parallelize([4 ,5 ,6,7,8,9,10]) df2 = df.subtract(df1) df2.show() df3 = df1.subtract(df) df3.show() 

I just want to check the result to see if I understand this function well. But got this error. PipelinedRDD object does not have the 'show' attribute any suggestion?

+5
source share
2 answers
 print(df2.take(10)) 

df.show() is for a DataFrame spark only

+7
source

He prints that it is an RDD, and that the type is a PipelinedRDD, and not a list of values, as we might expect. Due to the fact that we have not yet performed any action, we performed only the conversion.

Try to do it

 df3.count() 

then you can do

 df3.show() 
-1
source

Source: https://habr.com/ru/post/1261302/


All Articles