I need to create a new Spark DF MapType Column based on existing columns, where the column name is the key and value is the value.
As an example - I have this DF:
rdd = sc.parallelize([('123k', 1.3, 6.3, 7.6), ('d23d', 1.5, 2.0, 2.2), ('as3d', 2.2, 4.3, 9.0) ]) schema = StructType([StructField('key', StringType(), True), StructField('metric1', FloatType(), True), StructField('metric2', FloatType(), True), StructField('metric3', FloatType(), True)]) df = sqlContext.createDataFrame(rdd, schema) +----+-------+-------+-------+ | key|metric1|metric2|metric3| +----+-------+-------+-------+ |123k| 1.3| 6.3| 7.6| |d23d| 1.5| 2.0| 2.2| |as3d| 2.2| 4.3| 9.0| +----+-------+-------+-------+
I can still create structType from this:
nameCol = struct([name for name in df.columns if ("metric" in name)]).alias("metric") df2 = df.select("key", nameCol) +----+-------------+ | key| metric| +----+-------------+ |123k|[1.3,6.3,7.6]| |d23d|[1.5,2.0,2.2]| |as3d|[2.2,4.3,9.0]| +----+-------------+
But I need a metric column with am MapType, where the key is the name of the column:
+----+-------------------------+ | key| metric| +----+-------------------------+ |123k|Map(metric1 -> 1.3, me...| |d23d|Map(metric1 -> 1.5, me...| |as3d|Map(metric1 -> 2.2, me...| +----+-------------------------+
Any hints how can I convert the data?
Thanks!