Like groupBy when dropping some collection items

I would like to group a sequence into a sequence map based on an discriminator of type Option , similar to the result of the groupBy method, but where the values ​​leading to None are discarded. Or perhaps grouping with the PartialFunction discriminator and dropping those for which an incomplete function is not defined.

Here is a concrete example:

I have a collection of namespaces and a collection of namespaces. Some, but not all, names belong to a valid namespace, and I want to group those that do in Map, discarding those that don't.

Currently my solution is equivalent to:

 val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit") val namespaces = List("ns1", "ns2") def findNamespace(n: String): Option[String] = namespaces.find(n.startsWith) val groupedNames = names.groupBy(findNamespace).collect { case (Some(ns), name) => (ns, name) } // Map((ns1,List(ns1.foo)), (ns2,List(ns2.bar, ns2.baz))) 

My concern with this solution is that using names.groupBy(findNamespace) , I create an intermediate map that contains all the names that I don’t need, under the None key. If the number of names that I drop becomes large, this decision becomes less attractive.

My attempt to avoid this is a little about behavior, but:

 val groupedNames = names. map(n => (findNamespace(n), n)). collect({ case (Some(ns), n) => (ns, n) }). groupBy(_._1). map({ case (ns, names) => (ns, names.map(_._2)) }) 

If you decided it wiser, what would it be?


Edit: ideally, the solution should only call findNamespace(name) once for each name and build the map using only the Option[String] values, without calling a separate hasNamespace(name) predicate.

+6
source share
5 answers

You can use foldLeft:

 val gn = names.foldLeft(Map[String, List[String]]()){ case (acc, name) => findNamespace(name) match { case Some(ns) => acc + (ns -> (name :: acc.get(ns).getOrElse(Nil))) case _ => acc } } 

Assuming the order doesn't matter, you can change the values ​​with gn.mapValues(_.reverse) .

+3
source

One way to avoid collecting dropped names is to use flatMap :

 names.flatMap(n => findNamespace(n) map (ns => (ns, n))) .groupBy(_._1) .map { case (ns, pairs) => (ns, pairs map (_._2)) } 

You can achieve the same goal with the understanding:

 (for (n <- names; ns <- findNamespace(n)) yield (ns, n)) .groupBy(_._1) .map { case (ns, pairs) => (ns, pairs map (_._2)) } 
+6
source

I'm not sure how effective toMap , but by adding an option to understand, at least it avoids collecting None results:

 scala> val m = (for { n <- names; ns <- findNamespace(n) } yield n -> ns).toMap m: scala.collection.immutable.Map[java.lang.String,String] = Map(ns1.foo -> ns1, ns2.bar -> ns2, ns2.baz -> ns2) scala> val groupedNames = m.keys.groupBy(m) groupedNames: scala.collection.immutable.Map[String,Iterable[java.lang.String]] = Map(ns1 -> Set(ns1.foo), ns2 -> Set(ns2.bar, ns2.baz)) 
+4
source

I came up with a response option for huynhjl, replacing match with map :

 val gn = (Map[String, List[String]]() /: names) { (acc, name) => acc ++ findNamespace(name).map(ns => ns -> (name :: acc.getOrElse(ns, Nil))) } 
+2
source

I would suggest "filter first and then groupBy method", for example:

 scala> val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit") names: List[java.lang.String] = List(ns1.foo, ns2.bar, ns2.baz, froznit) scala> val namespaces = List("ns1", "ns2") namespaces: List[java.lang.String] = List(ns1, ns2) scala> names filter { n => namespaces exists { n startsWith _ } } groupBy { _ take 3 } res1: scala.collection.immutable.Map[String,List[java.lang.String]] = Map(ns1 -> List(ns1.foo), ns2 -> List(ns2.bar, ns2.baz)) 
0
source

Source: https://habr.com/ru/post/888155/


All Articles