Scala: remove duplicates in object list

I have a list of List[Object] objects that are all created from the same class. This class has a field that must be unique Object.property . What is the cleanest way to iterate over a list of objects and delete all objects (but the first) with the same property?

+49
list scala duplicates
Oct 12 '10 at 8:22
source share
8 answers
 list.groupBy(_.property).map(_._2.head) 

Explanation: The groupBy method accepts a function that converts an element into a key for grouping. _.property is just a shorthand for elem: Object => elem.property (the compiler generates a unique name, something like x$1 ). So now we have a map Map[Property, List[Object]] . A Map[K,V] continues to Traversable[(K,V)] . Thus, it can be moved like a list, but the elements are a tuple. This is similar to Java Map#entrySet() . The map method creates a new collection by iterating over each item and applying a function to it. In this case, the _._2.head function, which is short for elem: (Property, List[Object]) => elem._2.head . _2 is just a Tuple method that returns the second element. The second element is List [Object], and head returns the first element

To get the desired result:

 import collection.breakOut val l2: List[Object] = list.groupBy(_.property).map(_._2.head)(breakOut) 

To explain briefly, map actually expects two arguments: a function and an object, which are used to construct the result. In the first code fragment, you do not see the second value, because it is marked as implicit and provided by the compiler from the list of predefined values ​​in the field. The result is usually obtained from the associated container. This is usually good. the map in the list will return the list, the map on the array will return Array, etc. In this case, we want to express the container that we want as the result. The breakOut method is used here. He creates the builder (what creates the results), only looking at the desired type of result. This is a general method, and the compiler passes its generic types because we explicitly typed l2 as List[Object] or, to preserve order (assuming Object#property is of type Property ):

 list.foldRight((List[Object](), Set[Property]())) { case (o, cum@(objects, props)) => if (props(o.property)) cum else (o :: objects, props + o.property)) }._1 

foldRight is a method that takes an original result and a function that takes an element and returns an updated result. The method iterates over each element, updates the result in accordance with the application of the function to each element, and returns the final result. We go from right to left (and not from left to right with foldLeft ), because we add O (1) to objects , but O - N is the addition. Also watch the good style here, we use pattern matching to extract elements.

In this case, the initial result is a pair (tuple) of the empty list and set. The list is the result that interests us, and this set is used to track the properties that we have already encountered. At each iteration, we check whether the props set already props property (in Scala, obj(x) translated to obj.apply(x) . In Set the apply method is def apply(a: A): Boolean . That is, it accepts an element and returns true / false if it exists or not). If the property exists (already occurs), the result is returned as-is. Otherwise, the result is updated to contain the object ( o :: objects ), and the property is written ( props + o.property )

Update: @andreypopp wanted to use a generic method:

 import scala.collection.IterableLike import scala.collection.generic.CanBuildFrom class RichCollection[A, Repr](xs: IterableLike[A, Repr]){ def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = { val builder = cbf(xs.repr) val i = xs.iterator var set = Set[B]() while (i.hasNext) { val o = i.next val b = f(o) if (!set(b)) { set += b builder += o } } builder.result } } implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs) 

for use:

 scala> list.distinctBy(_.property) res7: List[Obj] = List(Obj(1), Obj(2), Obj(3)) 

Also note that this is pretty effective as we use the builder. If you have really large lists, you can use the modified HashSet instead of the usual set and evaluate performance.

+117
Oct 12 '10 at 8:37
source share

Here's a slightly insightful but quick fix that keeps order:

 list.filterNot{ var set = Set[Property]() obj => val b = set(obj.property); set += obj.property; b} 

Although it uses var internally, I think it is easier to understand and read than the foldLeft solution.

+13
Oct 12 '10 at 9:00
source share

Another solution

 @tailrec def collectUnique(l: List[Object], s: Set[Property], u: List[Object]): List[Object] = l match { case Nil => u.reverse case (h :: t) => if (s(h.property)) collectUnique(t, s, u) else collectUnique(t, s + h.prop, h :: u) } 
+6
12 Oct '10 at 9:54
source share

With save order:

 def distinctBy[L, E](list: List[L])(f: L => E): List[L] = list.foldLeft((Vector.empty[L], Set.empty[E])) { case ((acc, set), item) => val key = f(item) if (set.contains(key)) (acc, set) else (acc :+ item, set + key) }._1.toList distinctBy(list)(_.property) 
+4
Dec 24 '15 at 18:20
source share

Starting Scala 2.13 , most collections are now equipped with B): C rel = "nofollow noreferrer"> distinctBy methods, which returns all elements of the sequence, ignoring duplicates after applying this transform function:

 list.distinctBy(_.property) 

For example:

 List(("a", 2), ("b", 2), ("a", 5)).distinctBy(_._1) // List((a,2), (b,2)) List(("a", 2.7), ("b", 2.1), ("a", 5.4)).distinctBy(_._2.floor) // List((a,2.7), (a,5.4)) 
+3
Oct 02 '18 at 0:18
source share

I found a way to make it work with groupBy with one intermediate step:

 def distinctBy[T, P, From[X] <: TraversableLike[X, From[X]]](collection: From[T])(property: T => P): From[T] = { val uniqueValues: Set[T] = collection.groupBy(property).map(_._2.head)(breakOut) collection.filter(uniqueValues) } 

Use it as follows:

 scala> distinctBy(List(redVolvo, bluePrius, redLeon))(_.color) res0: List[Car] = List(redVolvo, bluePrius) 

Similar to IttayD's first solution, but it filters the original collection based on a set of unique values. If my expectations are true, this makes three workarounds: one for groupBy , one for map and one for filter . It maintains the order of the original collection, but does not necessarily take the first value for each property. For example, it could return List(bluePrius, redLeon) .

Of course, IttayD's solution is still faster because it only performs one crawl.

My solution also has the disadvantage that if the collection has Car , which are actually the same, both will be in the output list. This can be uniqueValues removing the filter and returning uniqueValues directly, with type From[T] . However, it seems that CanBuildFrom[Map[P, From[T]], T, From[T]] does not exist ... suggestions are welcome!

+2
Jan 18 '14 at
source share

Lots of good answers above. However, distinctBy already in Scala, but in a less obvious place. Perhaps you can use it as

 def distinctBy[A, B](xs: List[A])(f: A => B): List[A] = scala.reflect.internal.util.Collections.distinctBy(xs)(f) 
+1
Mar 20 '18 at 13:10
source share

I don't know which version of Scala you are using, but 2.8.2 definitely has

 list.distinct 

Change (lock down voices)

 list.distinctBy 
-3
Jan 22 '13 at 18:34
source share



All Articles