(Scalding) groupBy foldLeft, using group by value in collapse

There is such data:

pid recom-pid 1 1 1 2 1 3 2 1 2 2 2 4 2 5 

It is necessary to do this:

 pid, recommendations 1 2,3 2 1,4,5 

The value to ignore itself from the second column and do the rest in a row separated by a comma. Its tab is divided by data.

Tried changes but not sure how to access productId in foldLeft

 .groupBy('productId) { _.foldLeft(('prodReco) -> 'prodsR)("") { (s: String, s2: String) => { println(" s " + s + ", s2 :" + s2 + "; pid :" + productId + ".") if (productId.equals(s2)) { s } else { s + "," + s2; } } } } 

Using scala 2.10 with burning 0.10.0 and cascade 2.5.3. Need a scalding answer. I know how to manipulate data in scala. I'm just wondering how to get columns during a group by scalding and use them to conditionally fold on the left or other means to get filtered output.

For a complete working example, see https://github.com/tgkprog/scaldingEx2/tree/master/Q1

+5
source share
4 answers

Instead of groupBy and then foldLeft use only foldLeft .
Here is a solution using scala collections, but it should work using scan> :

 val source = List((1,1), (1,2), (1,3), (2,1), (2,2), (2,4), (2,5)) source.foldLeft(Map[Int, List[Int]]())((m,e) => if (e._1 == e._2) m else m + (e._1 -> (e._2 :: m.getOrElse(e._1, List())))) 
+2
source

Just for groupBy and map should be enough to accomplish what you want.

 // Input data formatted as a list of tuples. val tt = Seq((1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 4), (2, 5)) tt .groupBy(_._1) // Map(2 -> List((2, 1), ...), 1 -> List((1, 1), ...)) .toSeq // for easier mapping .map({ case (pid, recomPids) => { val pids = recomPids.collect({ case recomPid if recomPid._2 != pid => recomPid._2 }) (pid, pids) } }) // List((2, List(1, 4, 5)), (1, List(2, 3))) 

I simplified the I / O form to just focus on getting the collections in the correct form.

+1
source

Assume pid| recom-pid > temp.txt pid| recom-pid > temp.txt and therefore

 import scala.io.Source val xs = Source.fromFile("temp.txt").getLines.toArray.map(_.split("\\|")) 

Convert xs to tuples, for example

 val pairs = for (Array(pid, recom) <- xs) yield (pid,recom) Array((1,1), (1,2), (1,3), (2,1), (2,2), (2,4), (2,5)) 

and a group on the first element,

 val g = pairs.groupBy(_._1) Map(2 -> Array((2,1), (2,2), (2,4), (2,5)), 1 -> Array((1,1), (1,2), (1,3))) 

Then we delete the matching identical tuples that always contain a record on the map, where an empty array means that there is only an identical set (that is, a unique occurrence of 3|3 will result in 3 -> Array() ),

 val res = g.mapValues(_.filter { case (a,b) => a != b } ) Map(2 -> Array((2,1), (2,4), (2,5)), 1 -> Array((1,2), (1,3))) 
+1
source

Recognizing the input of a string is correct, which returns you a map [String, Array [String]]

 s.split('\n') .map(_.split("\\|")) .groupBy(_(0)) .mapValues(_.flatten) .transform {case (k, v) ⇒ v.filter(_ != k)} 
+1
source

Source: https://habr.com/ru/post/1232917/


All Articles