Get column data type using pyspark

Question

Get column data type using pyspark

We are reading data from CollectionMongoDB. A column Collectionhas two different meanings (for example:) (bson.Int64,int) (int,float).

I am trying to get a data type using pyspark.

My problem is that some columns have different data types.

Suppose quantityand weightare columns

quantity           weight
---------          --------
12300              656
123566000000       789.6767
1238               56.22
345                23
345566677777789    21

In fact, we did not define a data type for any column of the Mongo collection.

When I request an invoice from pyspark dataframe

dataframe.count()

I got an exception like this

"Cannot cast STRING into a DoubleType (value: BsonString{value=&apos;200.0&apos;})"

+10

apache-spark pyspark apache-spark-sql spark-dataframe databricks

Sreenuvasulu Jul 11 '17 at 11:29

source share

4 answers

, . , .

.

+1

Henrique Florêncio 06 . '18 17:13

eliasah · Answer 1 · 2017-07-11T16:10:35+0000

Your question is broad, so my answer will also be broad.

DataFrame, dtypes i.e:

>>> df.dtypes
[('age', 'int'), ('name', 'string')]

, age int name string.

Luis A.G. · Answer 2 · 2017-07-12T08:43:26+0000

, mongodb, mongodb, . sql-, :

df.schema

user24225 · Answer 3 · 2017-07-11T16:05:32+0000

, , .

input_data = [Read from Mongo DB operation]

you can use

type(input_data)

to check data type

Get column data type using pyspark

More articles: