I need to group the “KEY” column and need to check if the “TYPE_CODE” column has the values “PL” and “JL”, if so, I need to add an indicator column like “Y”, otherwise “N”,
Example:
//Input Values
val values = List(List("66","PL") ,
List("67","JL") , List("67","PL"),List("67","PO"),
List("68","JL"),List("68","PO")).map(x =>(x(0), x(1)))
import spark.implicits._
//created a dataframe
val cmc = values.toDF("KEY","TYPE_CODE")
cmc.show(false)
KEY |TYPE_CODE |
66 |PL |
67 |JL |
67 |PL |
67 |PO |
68 |JL |
68 |PO |
Expected Result:
For each "KEY", if it has "TYPE_CODE", there is both PL and JL, then Y else N
-----------------------------------------------------
KEY |TYPE_CODE | Indicator
-----------------------------------------------------
66 |PL | N
67 |JL | Y
67 |PL | Y
67 |PO | Y
68 |JL | N
68 |PO | N
---------------------------------------------------
For example, 67 has both PL and JL - So "Y" 66 has only PL - So "N" 68 has only JL - So "N"
raam source
share