The following code shows how a DataFrame is created. You can see that the dataframe consists of two columns. Each column has integers and a row in the last row.
As I understand it, createDataFrame should parse the data types of the columns (and therefore in the rows). Then suppose one data type that can include all rows. In this case, I believe that the columns should be string data types, because this type can include numbers and strings.
Therefore, why in the resulting DataFrame there are columns with a long data type, and the rows are reduced to zero?
b = sqlContext.createDataFrame([(1, 2),(2, 3), (3, 3), ('test0', 'test1')], ['pepe', 'pepa'], samplingRatio=1)
b.show()
@ccheneson , samplingRatio, . , ?