Like groupBy when dropping some collection items

Question

Like groupBy when dropping some collection items

I would like to group a sequence into a sequence map based on an discriminator of type Option , similar to the result of the groupBy method, but where the values leading to None are discarded. Or perhaps grouping with the PartialFunction discriminator and dropping those for which an incomplete function is not defined.

Here is a concrete example:

I have a collection of namespaces and a collection of namespaces. Some, but not all, names belong to a valid namespace, and I want to group those that do in Map, discarding those that don't.

Currently my solution is equivalent to:

 val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit") val namespaces = List("ns1", "ns2") def findNamespace(n: String): Option[String] = namespaces.find(n.startsWith) val groupedNames = names.groupBy(findNamespace).collect { case (Some(ns), name) => (ns, name) } // Map((ns1,List(ns1.foo)), (ns2,List(ns2.bar, ns2.baz)))

My concern with this solution is that using names.groupBy(findNamespace) , I create an intermediate map that contains all the names that I don’t need, under the None key. If the number of names that I drop becomes large, this decision becomes less attractive.

My attempt to avoid this is a little about behavior, but:

 val groupedNames = names. map(n => (findNamespace(n), n)). collect({ case (Some(ns), n) => (ns, n) }). groupBy(_._1). map({ case (ns, names) => (ns, names.map(_._2)) })

If you decided it wiser, what would it be?

Edit: ideally, the solution should only call findNamespace(name) once for each name and build the map using only the Option[String] values, without calling a separate hasNamespace(name) predicate.

+6

scala

Ben james May 15, '11 at 17:27

source share

5 answers

One way to avoid collecting dropped names is to use flatMap :

 names.flatMap(n => findNamespace(n) map (ns => (ns, n))) .groupBy(_._1) .map { case (ns, pairs) => (ns, pairs map (_._2)) }

You can achieve the same goal with the understanding:

 (for (n <- names; ns <- findNamespace(n)) yield (ns, n)) .groupBy(_._1) .map { case (ns, pairs) => (ns, pairs map (_._2)) }

+6

Aaron Novstrup May 15, '11 at 20:54

source share

I'm not sure how effective toMap , but by adding an option to understand, at least it avoids collecting None results:

 scala> val m = (for { n <- names; ns <- findNamespace(n) } yield n -> ns).toMap m: scala.collection.immutable.Map[java.lang.String,String] = Map(ns1.foo -> ns1, ns2.bar -> ns2, ns2.baz -> ns2) scala> val groupedNames = m.keys.groupBy(m) groupedNames: scala.collection.immutable.Map[String,Iterable[java.lang.String]] = Map(ns1 -> Set(ns1.foo), ns2 -> Set(ns2.bar, ns2.baz))

+4

Nicolas payette May 15, '11 at 19:59

source share

I came up with a response option for huynhjl, replacing match with map :

 val gn = (Map[String, List[String]]() /: names) { (acc, name) => acc ++ findNamespace(name).map(ns => ns -> (name :: acc.getOrElse(ns, Nil))) }

+2

Ben james May 15, '11 at 23:01

source share

I would suggest "filter first and then groupBy method", for example:

 scala> val names = List("ns1.foo", "ns2.bar", "ns2.baz", "froznit") names: List[java.lang.String] = List(ns1.foo, ns2.bar, ns2.baz, froznit) scala> val namespaces = List("ns1", "ns2") namespaces: List[java.lang.String] = List(ns1, ns2) scala> names filter { n => namespaces exists { n startsWith _ } } groupBy { _ take 3 } res1: scala.collection.immutable.Map[String,List[java.lang.String]] = Map(ns1 -> List(ns1.foo), ns2 -> List(ns2.bar, ns2.baz))

0

Antonin Brettsnajdr May 15, '11 at 17:43

source share

huynhjl · Accepted Answer · 2011-05-15T20:11:21+0000

You can use foldLeft:

 val gn = names.foldLeft(Map[String, List[String]]()){ case (acc, name) => findNamespace(name) match { case Some(ns) => acc + (ns -> (name :: acc.get(ns).getOrElse(Nil))) case _ => acc } }

Assuming the order doesn't matter, you can change the values with gn.mapValues(_.reverse) .

Like groupBy when dropping some collection items

More articles: