I have a DataFrame containing a column of x, y
x y
1 false
1 false
1 true
2 true
2 false
3 null
3 true
I am trying to create a contingency table with the following code and expecting the result below:
myDataFrame.stat.crosstab("x", "y")
x_y true false null
1 1 2 0
2 1 1 0
3 1 0 1
However, I get the following exception: AnalysisException cannot resolve 'true' given input columns [x, y]
The column "true" (as well as "false" and "null") is created automatically stat.crosstabat run time. Static analysis cannot detect a new column name without first making a full pass through the data.
I am using Spark 1.6.1.5. This is mistake? Is there any way to turn off the analyzer?
source
share