Here are read and write examples for reading and writing in excel with a full range of options. ..
Source sparking from crealytics
Scala API Spark 2.0 +:
Create DataFrame from Excel File
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.crealytics.spark.excel")
.option("sheetName", "Daily")
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "false")
.option("inferSchema", "false")
.option("addColorColumns", "true")
.option("startColumn", 0)
.option("endColumn", 99)
.option("timestampFormat", "MM-dd-yyyy HH:mm:ss")
.option("maxRowsInMemory", 20)
.option("excerptSize", 10)
.schema(myCustomSchema)
.load("Worktime.xlsx")
Writing DataFrame to Excel File
df.write
.format("com.crealytics.spark.excel")
.option("sheetName", "Daily")
.option("useHeader", "true")
.option("dateFormat", "yy-mmm-d")
.option("timestampFormat", "mm-dd-yyyy hh:mm:ss")
.mode("overwrite")
.save("Worktime2.xlsx")
. 1 2 . , , .
Spark, --packages. , :
$SPARK_HOME/bin/spark-shell --packages com.crealytics:spark-excel_2.11:0.9.8
groupId: com.crealytics
artifactId: spark-excel_2.11
version: 0.9.8
: , maven, Excel excel src/main/resources, unit test (scala/java), DataFrame [s] excel...
Spark HadoopOffice. Spark , , Spark 2.0.1. HadoopOffice Spark 1.x. HadoopOffice:
Datasource Excel: org.zuinnote.spark.office.Excel Excel (.xls) Excel (.xlsx) Spark-packages.org Maven Central.