You can use org.zuinnote.spark.office.excelto read and write an Excel file using a dataset. Examples are provided at https://github.com/ZuInnoTe/spark-hadoopoffice-ds/ . However, there is one problem if you are reading Excel in a dataset and trying to write it to another Excel file. Please view the problem and workaround in scala at https://github.com/ZuInnoTe/hadoopoffice/issues/12 .
I wrote a sample program in Java using org.zuinnote.spark.office.exceland the workaround given at this link. See if this helps you.
public class SparkExcel {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("SparkExcel")
.master("local[*]")
.getOrCreate();
Dataset<Row> df = spark
.read()
.format("org.zuinnote.spark.office.excel")
.option("read.locale.bcp47", "de")
.load("c:\\temp\\test1.xlsx");
df.show();
df.printSchema();
FlatMapFunction<Row, String[]> flatMapFunc = new FlatMapFunction<Row, String[]>() {
@Override
public Iterator<String[]> call(Row row) throws Exception {
ArrayList<String[]> rowList = new ArrayList<String[]>();
List<Row> spreadSheetRows = row.getList(0);
for (Row srow : spreadSheetRows) {
ArrayList<String> arr = new ArrayList<String>();
arr.add(srow.getString(0));
arr.add(srow.getString(1));
arr.add(srow.getString(2));
arr.add(srow.getString(3));
arr.add(srow.getString(4));
rowList.add(arr.toArray(new String[] {}));
}
return rowList.iterator();
}
};
Dataset<String[]> df2 = df.flatMap(flatMapFunc, spark.implicits().newStringArrayEncoder());
df2.write()
.mode(SaveMode.Overwrite)
.format("org.zuinnote.spark.office.excel")
.option("write.locale.bcp47", "de")
.save("c:\\temp\\test2.xlsx");
}
}
I tested this code with Java 8 and Spark 2.1.0. I use maven and added a dependency for org.zuinnote.spark.office.excelon https://mvnrepository.com/artifact/com.github.zuinnote/spark-hadoopoffice-ds_2.11/1.0.3