Remove column from data spark

Question

Remove column from data spark

I have a Spark DataFrame with a very large number of columns. I want to remove two columns from it in order to get a new data frame.

If there were fewer columns, I could use the select method in the API as follows:

pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime"));

But since selecting columns from a long list is a tedious task, is there a workaround?

+18

java dataframe apache-spark apache-spark-sql spark-dataframe

Count Jan 20 '17 at 12:02

source share

1 answer

SanthoshPrasad · Accepted Answer · 2017-01-20T12:05:36+0000

Use the drop method and using the ColumnRenamed methods .

Example:

  val initialDf= .... val dfAfterDrop=initialDf.drop("column1").drop("coumn2") val dfAfterColRename= dfAfterDrop.withColumnRenamed("oldColumnName","new ColumnName")

Remove column from data spark

More articles: