Adding a constant column for sparking

I am using Spark version 2.1 in the Databricks. I have a data frame with a name wampto which I want to add a column with a name regionthat should take a constant value NE. However, I get an error NameError: name 'lit' is not definedwhen I run the following command:

wamp = wamp.withColumn('region', lit('NE'))

What am I doing wrong?

+6
source share
2 answers

you need to import lit

or

from pyspark.sql.functions import *

will make litavailable

or something like

import pyspark.sql.functions as sf
wamp = wamp.withColumn('region', sf.lit('NE'))
+13
source

muon @ provided the correct answer above. Just add a fast playable version for clarity.

>>> from pyspark.sql.functions import lit
>>> df = spark.createDataFrame([(1, 4, 3)], ['a', 'b', 'c'])
>>> df.show()
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  4|  3|
+---+---+---+

>>> df = df.withColumn("d", lit(5))
>>> df.show()
+---+---+---+---+
|  a|  b|  c|  d|
+---+---+---+---+
|  1|  4|  3|  5|
+---+---+---+---+
0
source

Source: https://habr.com/ru/post/1677386/


All Articles