Get column data type using pyspark

We are reading data from CollectionMongoDB. A column Collectionhas two different meanings (for example:) (bson.Int64,int) (int,float).

I am trying to get a data type using pyspark.

My problem is that some columns have different data types.

Suppose quantityand weightare columns

quantity           weight
---------          --------
12300              656
123566000000       789.6767
1238               56.22
345                23
345566677777789    21

In fact, we did not define a data type for any column of the Mongo collection.

When I request an invoice from pyspark dataframe

dataframe.count()

I got an exception like this

"Cannot cast STRING into a DoubleType (value: BsonString{value='200.0'})"
+10
source share
4 answers

Your question is broad, so my answer will also be broad.

DataFrame, dtypes i.e:

>>> df.dtypes
[('age', 'int'), ('name', 'string')]

, age int name string.

+21

, mongodb, mongodb, . sql-, :

df.schema
+2

, . , .

.

+1

, , .

input_data = [Read from Mongo DB operation]

you can use

type(input_data) 

to check data type

-3
source

Source: https://habr.com/ru/post/1681187/


All Articles