I get NaN when calculating standard deviation (stddev). This is a very simple use case, as described below:
val df = Seq(("1",19603176695L),("2", 26438904194L),("3",29640527990L),("4",21034972928L),("5", 23975L)).toDF("v","data")
I have stddev defined as UDF:
def stddev(col: Column) = {
sqrt(mean(col*col) - mean(col)*mean(col))
}
I get NaNwhen I call UDF, as shown below:
df.agg(stddev(col("data")).as("stddev")).show()
This produces the following:
+------+
|stddev|
+------+
| NaN|
+------+
What am I doing wrong?
source
share