Say I have such a data frame
name age city
abc 20 A
def 30 B
I want to add a summary line at the end of the data framework, so the result will look like
name age city
abc 20 A
def 30 B
All 50 All
So, String 'All', I can easily set, but how to get the sum (df ['age']) ### column object is not iterable
data = spark.createDataFrame([("abc", 20, "A"), ("def", 30, "B")],["name", "age", "city"])
data.printSchema()
#root
#|-- name: string (nullable = true)
#|-- age: long (nullable = true)
#|-- city: string (nullable = true)
res = data.union(spark.createDataFrame([('All',sum(data['age']),'All')], data.columns)) ## TypeError: Column is not iterable
#Even tried with data['age'].sum() and got error. If i am using [('All',50,'All')], it is doing fine.
I usually work on a Pandas dataframe and a new one for Spark. Could be my lack of information about a spark frame that is not ripe.
Please suggest how to get the sum from the dataframe column in pyspark. And if there is a better way to add / add a row to the end of the data frame. Thank.
Satya source
share