I have dataframe, and I apply a function to it. This function returns the numpy arraycode as follows:
create_vector_udf = udf(create_vector, ArrayType(FloatType()))
dataframe = dataframe.withColumn('vector', create_vector_udf('text'))
dmoz_spark_df.select('lang','url','vector').show(20)
Now the spark does not seem satisfied with this and does not accept ArrayType(FloatType())
The following error message appears:
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
I could just numpyarray.tolist()return its version of the list, but obviously I would always need to recreate arrayit if I want to use it with numpy.
so is there a way to save numpy arrayin dataframe column?
source
share