Why does inferrer of createDataFrame not create columns of this data as rows?

The following code shows how a DataFrame is created. You can see that the dataframe consists of two columns. Each column has integers and a row in the last row.

As I understand it, createDataFrame should parse the data types of the columns (and therefore in the rows). Then suppose one data type that can include all rows. In this case, I believe that the columns should be string data types, because this type can include numbers and strings.

Therefore, why in the resulting DataFrame there are columns with a long data type, and the rows are reduced to zero?

# DataFrame construction:
b = sqlContext.createDataFrame([(1, 2),(2, 3), (3, 3), ('test0', 'test1')], ['pepe', 'pepa'], samplingRatio=1)
b.show()


#+----+----+
#|pepe|pepa|
#+----+----+
#|   1|   2|
#|   2|   3|
#|   3|   3|
#|null|null|
#+----+----+

@ccheneson , samplingRatio, . , ?

+4
1

, samplingRatio , DataFrame . , data RDD. Java RDD, NULL.

RDD , Spark , , , , Scala. , Spark .

inferSchema ? RDDs, Python. .

+2

Source: https://habr.com/ru/post/1628855/


All Articles