The problem here is the implementation of VectorAssembler , not the columns as such. You can, for example, skip the header:
val df = spark.read.format("csv") .options(Map("inferSchema" -> "true", "comment" -> "\"")) .load(path) new VectorAssembler() .setInputCols(df.columns) .setOutputCol("vs") .transform(df)
or rename columns before moving on to VectorAssembler :
val renamed = df.toDF(df.columns.map(_.replace(".", "_")): _*) new VectorAssembler() .setInputCols(renamed.columns) .setOutputCol("vs") .transform(renamed)
Finally, a better approach is to explicitly provide a schema:
import org.apache.spark.sql.types._ val schema = StructType((0 until 4).map(i => StructField(s"_$i", DoubleType))) val dfExplicit = spark.read.format("csv") .options(Map("header" -> "true")) .schema(schema) .load(path) new VectorAssembler() .setInputCols(dfExplicit.columns) .setOutputCol("vs") .transform(dfExplicit)
source share