Why does inferrer of createDataFrame not create columns of this data as rows?

Question

Why does inferrer of createDataFrame not create columns of this data as rows?

The following code shows how a DataFrame is created. You can see that the dataframe consists of two columns. Each column has integers and a row in the last row.

As I understand it, createDataFrame should parse the data types of the columns (and therefore in the rows). Then suppose one data type that can include all rows. In this case, I believe that the columns should be string data types, because this type can include numbers and strings.

Therefore, why in the resulting DataFrame there are columns with a long data type, and the rows are reduced to zero?

# DataFrame construction:
b = sqlContext.createDataFrame([(1, 2),(2, 3), (3, 3), ('test0', 'test1')], ['pepe', 'pepa'], samplingRatio=1)
b.show()


#+----+----+
#|pepe|pepa|
#+----+----+
#|   1|   2|
#|   2|   3|
#|   3|   3|
#|null|null|
#+----+----+

@ccheneson , samplingRatio, . , ?

+4

python apache-spark pyspark apache-spark-sql

Hugo Reyes 15 . '16 14:45

1

zero323 · Accepted Answer · 2016-02-15T15:19:07+0000

, samplingRatio , DataFrame . , data RDD. Java RDD, NULL.

RDD , Spark , , , , Scala. , Spark .

inferSchema ? RDDs, Python. .

Why does inferrer of createDataFrame not create columns of this data as rows?

More articles: