Unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when passing to Int on the ApacheSpark framework
I got an error while trying to apply StringType to IntType in the pyspark framework:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
.select(aggregates.year,'Production')\
.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
.drop("Production")\
.withColumnRenamed("ProductionTmp", "Production")
I get:
TypeErrorTraceback (last last call) in () 1 joint = aggregates.join (df_data_3, aggregates.year == df_data_3.year) ----> 2 joint2 = joint.filter (joint.CountyCode == 999) .filter (joint .CropName == 'WOOL)
.select (aggregates.year,' Production '). WithColumn ("ProductionTmp", df_data_3.Production.cast (IntegerType)). Drop ("Production")
.withColumnRenamed ("ProductionTmp", "Production ")/usr/local/src/spark20master/spark/python/pyspark/sql/column.py cast (self, dataType) 335 jc = self._jc.cast(jdt) 336: → 337 raise TypeError ( " :% s" % type (dataType)) 338 return (jc) 339
TypeError: :
PySpark SQL ( 1.3) . :
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col
col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>
:
col("foo").cast(IntegerType)
TypeError
...
TypeError: unexpected type: <class 'type'>
cast :
col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>