I have a data block in sparks, something like this:
ID | Column ------ | ---- 1 | STRINGOFLETTERS 2 | SOMEOTHERCHARACTERS 3 | ANOTHERSTRING 4 | EXAMPLEEXAMPLE
What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:
ID | New Column ------ | ------ 1 | STRIN_F 2 | SOMEO_E 3 | ANOTH_S 4 | EXAMP_E
I cannot use the following code because the values ββin the columns are different, and I do not want to divide by a certain character, but by the 6th character:
import pyspark split_col = pyspark.sql.functions.split(DF['column'], ' ') newDF = DF.withColumn('new_column', split_col.getItem(0))
Thanks everyone!
source share