This problem is actually a bit complicated due to the implementation tbl_sparkand incompatibility in the semantics of Spark and R. Even if applied colSums, Spark SQL does not allow implicit conversions between Boolean and numerical. This means that you must explicitly apply as.numeric:
library(dplyr)
sampledata <- copy_to(sc, data.frame(x=c(1, NA, 2), y=c(NA, 2, NA), z=42))
sampledata %>%
mutate_all(is.na) %>%
mutate_all(as.numeric) %>%
summarize_all(sum)
# Source: lazy query [?? x 3]
# Database: spark_connection
x y z
<dbl> <dbl> <dbl>
1 1 2 0
source
share