First, you should consider whether you really need to save the data frame as text. Because a DataFrame stores data column- DataFrame (rather than row-wise as rdd), the .rdd operation is expensive because it requires data processing. parquet is a columnar format and much more efficient to use.
In this case, sometimes you really need to save as a text file.
As far as I know, a DataFrame out of the box will not let you save as a text file. If you look at the source code , you will see that 4 formats are supported:
jdbc json parquet orc
so your options either use df.rdd.saveAsTextFile , as suggested earlier, or use spark-csv , which allows you to do something like:
Spark 1.4+:
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv") df.select("year", "model").write.format("com.databricks.spark.csv").save("newcars.csv")
Spark 1.3:
val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" -> "true")) df.select("year", "model").save("newcars.csv", "com.databricks.spark.csv")
value-added processing of annoying parts of quoting and escaping strings
source share