Enter
I have a Parameters column of type map form:
>>> from pyspark.sql import SQLContext >>> sqlContext = SQLContext(sc) >>> d = [{'Parameters': {'foo': '1', 'bar': '2', 'baz': 'aaa'}}] >>> df = sqlContext.createDataFrame(d) >>> df.collect() [Row(Parameters={'foo': '1', 'bar': '2', 'baz': 'aaa'})]
Output
I want to change it in pyspark so that all keys ( foo , bar , etc.) are columns, namely:
[Row(foo='1', bar='2', baz='aaa')]
Using withColumn works:
(df .withColumn('foo', df.Parameters['foo']) .withColumn('bar', df.Parameters['bar']) .withColumn('baz', df.Parameters['baz']) .drop('Parameters') ).collect()
But I need a solution that does not explicitly mention column names , since I have dozens of them.
Scheme
>>> df.printSchema() root |-- Parameters: map (nullable = true) | |-- key: string | |-- value: string (valueContainsNull = true)
source share