Exceptional Python spark characters from data frame

I have a data block in sparks, something like this:

ID | Column ------ | ---- 1 | STRINGOFLETTERS 2 | SOMEOTHERCHARACTERS 3 | ANOTHERSTRING 4 | EXAMPLEEXAMPLE 

What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:

 ID | New Column ------ | ------ 1 | STRIN_F 2 | SOMEO_E 3 | ANOTH_S 4 | EXAMP_E 

I cannot use the following code because the values ​​in the columns are different, and I do not want to divide by a certain character, but by the 6th character:

 import pyspark split_col = pyspark.sql.functions.split(DF['column'], ' ') newDF = DF.withColumn('new_column', split_col.getItem(0)) 

Thanks everyone!

+5
source share
1 answer

Use something like this:

 df.withColumn('new_column', concat(df.Column.substr(1, 5), lit('_'), df.Column.substr(8, 1))) 

Use substr and concat function

These features will help solve your problem.

+6
source

Source: https://habr.com/ru/post/1260663/


All Articles