Assume pid| recom-pid > temp.txt pid| recom-pid > temp.txt and therefore
import scala.io.Source val xs = Source.fromFile("temp.txt").getLines.toArray.map(_.split("\\|"))
Convert xs to tuples, for example
val pairs = for (Array(pid, recom) <- xs) yield (pid,recom) Array((1,1), (1,2), (1,3), (2,1), (2,2), (2,4), (2,5))
and a group on the first element,
val g = pairs.groupBy(_._1) Map(2 -> Array((2,1), (2,2), (2,4), (2,5)), 1 -> Array((1,1), (1,2), (1,3)))
Then we delete the matching identical tuples that always contain a record on the map, where an empty array means that there is only an identical set (that is, a unique occurrence of 3|3 will result in 3 -> Array() ),
val res = g.mapValues(_.filter { case (a,b) => a != b } ) Map(2 -> Array((2,1), (2,4), (2,5)), 1 -> Array((1,2), (1,3)))