Exceptional Python spark characters from data frame

Question

Exceptional Python spark characters from data frame

I have a data block in sparks, something like this:

ID | Column ------ | ---- 1 | STRINGOFLETTERS 2 | SOMEOTHERCHARACTERS 3 | ANOTHERSTRING 4 | EXAMPLEEXAMPLE

What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:

 ID | New Column ------ | ------ 1 | STRIN_F 2 | SOMEO_E 3 | ANOTH_S 4 | EXAMP_E

I cannot use the following code because the values in the columns are different, and I do not want to divide by a certain character, but by the 6th character:

 import pyspark split_col = pyspark.sql.functions.split(DF['column'], ' ') newDF = DF.withColumn('new_column', split_col.getItem(0))

Thanks everyone!

+5

python-2.7 apache-spark pyspark

Amanda c Dec 01 '16 at 17:10

source share

1 answer

Thiago baldim · Accepted Answer · 2016-12-01T17:44:16+0000

Use something like this:

 df.withColumn('new_column', concat(df.Column.substr(1, 5), lit('_'), df.Column.substr(8, 1)))

Use substr and concat function

These features will help solve your problem.

Exceptional Python spark characters from data frame

More articles: