Replace DataFrame values with Scala API

Question

Replace DataFrame values with Scala API

I need to replace some values in Column DataFrame(zeros and zeros for the mode, I know that this approach is not very accurate, but I just practice). I own the PythonApache Spark documentation , and the examples are usually more explanatory. So I decided to take a look there first, apart from the Scala documentation, and I noticed that I can achieve what I need using replace from DataFrames.

In this example, I replace everything 2with 20in the column col.

df = df.replace("2", "20", subset="col")

After some confidence in the API, PythonI decided to replicate this to Scala, and I noticed some strange things in the document Scala. Firstly, it is obvious that it DataFramesdoes not have a method replace. Secondly, after some research, I noticed that I should use the replace DataFrameNaFunctions functionality , but this is a rare part, if you see the details of this method, you will notice that they use this function in the same way as in the implementation Python(see the figure below) .

After that, I tried to run this in Scala and exploded, showing the following error:

Name: Compile Error
Message: <console>:108: error: value replace is not a member of org.apache.spark.sql.DataFrame
                  val dx = df.replace(column, Map(0.0 -> doubleValue))
                              ^
StackTrace:

Then I tried to apply replaceusing DataFrameNaFunctions, but I can’t get it to work as easy as in Python, because I got an error and I don’t understand why.

val dx = df.na.replace(column, Map(0.0 -> doubleValue))

The error comes:

Name: Compile Error
Message: <console>:108: error: overloaded method value replace with alternatives:
  [T](cols: Seq[String], replacement: scala.collection.immutable.Map[T,T])org.apache.spark.sql.DataFrame <and>
  [T](col: String, replacement: scala.collection.immutable.Map[T,T])org.apache.spark.sql.DataFrame <and>
  [T](cols: Array[String], replacement: java.util.Map[T,T])org.apache.spark.sql.DataFrame <and>
  [T](col: String, replacement: java.util.Map[T,T])org.apache.spark.sql.DataFrame
 cannot be applied to (String, scala.collection.mutable.Map[Double,Double])
                  val dx = df.na.replace(column, Map(0.0 -> doubleValue))
                                 ^

+4

python scala apache-spark

Alberto Bonsanto 19 . '16 19:21

1

Alberto Bonsanto · Accepted Answer · 2016-03-01T16:36:39+0000

, , mutable, .toMap, immutable.

val dx = df.na.replace(column, Map(0.0 -> doubleValue))

Replace DataFrame values ​​with Scala API

More articles:

Replace DataFrame values with Scala API