So the actual error that is emitted here is:
java.lang.IllegalArgumentException: Delimiter cannot be more than one character: ¦¦
The docs confirm this limitation, and I checked the Spark 2.0 csv reader, and it has the same requirement.
Given all this, if your data is simple enough, if you don’t have entries containing ¦¦
, I would upload your data like this:
scala> :pa // Entering paste mode (ctrl-D to finish) val customSchema_1 = StructType(Array( StructField("ID", StringType, true), StructField("FILLER", StringType, true), StructField("CODE", StringType, true))); // Exiting paste mode, now interpreting. customSchema_1: org.apache.spark.sql.types.StructType = StructType(StructField(ID,StringType,true), StructField(FILLER,StringType,true), StructField(CODE,StringType,true)) scala> val rawData = sc.textFile("example.txt") rawData: org.apache.spark.rdd.RDD[String] = example.txt MapPartitionsRDD[1] at textFile at <console>:31 scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> val rowRDD = rawData.map(line => Row.fromSeq(line.split("¦¦"))) rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[3] at map at <console>:34 scala> val df = sqlContext.createDataFrame(rowRDD, customSchema_1) df: org.apache.spark.sql.DataFrame = [ID: string, FILLER: string, CODE: string] scala> df.show +-----+------+----+ | ID|FILLER|CODE| +-----+------+----+ |12345| | 10| +-----+------+----+
source share