Unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when passing to Int on the ApacheSpark framework

I got an error while trying to apply StringType to IntType in the pyspark framework:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
    .select(aggregates.year,'Production')\
    .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
    .drop("Production")\
    .withColumnRenamed("ProductionTmp", "Production")

I get:

TypeErrorTraceback (last last call) in () 1 joint = aggregates.join (df_data_3, aggregates.year == df_data_3.year) ----> 2 joint2 = joint.filter (joint.CountyCode == 999) .filter (joint .CropName == 'WOOL)

.select (aggregates.year,' Production '). WithColumn ("ProductionTmp", df_data_3.Production.cast (IntegerType)). Drop ("Production")
.withColumnRenamed ("ProductionTmp", "Production ")

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py cast (self, dataType)     335 jc = self._jc.cast(jdt)     336: → 337 raise TypeError ( " :% s" % type (dataType))     338 return (jc)     339

TypeError: :

+4
1

PySpark SQL ( 1.3) . :

from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>

:

col("foo").cast(IntegerType)
TypeError  
   ...
TypeError: unexpected type: <class 'type'>

cast :

col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>
+5

Source: https://habr.com/ru/post/1661327/


All Articles