CSVFileFormat
seems to read and write empty values โโas null for string columns. I searched around, but could not find clear information about this, so I put together a simple test.
val df = session.createDataFrame(Seq( (0, "a"), (1, "b"), (2, "c"), (3, ""), (4, null) )) df.coalesce(1).write.mode("overwrite").format("csv") .option("delimiter", ",") .option("nullValue", "unknown") .option("treatEmptyValuesAsNulls", "false") .save(s"$path/test")
It is output:
0,a 1,b 2,c 3,unknown 4,unknown
Thus, it seems to handle both empty strings and null
values โโas null
. The same thing happens when reading a CSV file with empty quoted strings and zeros. Is there any way to treat them differently?
source share