Groovy: Difference with a.intersect (b) and b.intersect (a)

Question

Groovy: Difference with a.intersect (b) and b.intersect (a)

Why in Groovy, when I create 2 lists, is there any difference if I do a.intersect (b) and b.intersect (a):

def list1 = ["hello", "world", "world"]; def list2 = ["world", "world", "world"]; println( "Intersect list1 with list2: " + list1.intersect( list2 ) ); println( "Intersect list2 with list1: " + list2.intersect( list1) );

traces:

 Intersect list1 with list2: [world, world, world] Intersect list2 with list1: [world, world]

(you can copy it here: http://groovyconsole.appspot.com/ if you want to test it)

If all arrays contain unique elements, then this works fine. When you start adding duplicates, it gets weird:

 def list1 = ["hello", "world", "test", "test"]; def list2 = ["world", "world", "world", "test"]; println( "Intersect list1 with list2: " + list1.intersect( list2 ) ); println( "Intersect list2 with list1: " + list2.intersect( list1 ) );

traces:

 Intersect list1 with list2: [world, world, world, test] Intersect list2 with list1: [world, test, test]

I thought the whole point of intersect() is to give you common elements, so no matter what order you put them in?

If this is not the case, how can I get only common elements (expecting duplicates in an array). For instance. For example, you need to return ["world", "world"] , and example two should return ["world", "test"]

Edit

To clarify a bit, this code should check that the user data is still the same (provided that they are disabled in the middle of something, and we want to make sure that the data has not been changed or is in the same state as before).

The order of the lists cannot be guaranteed (the user can change their order, but he is still technically “the same”), and duplicates are possible.

So, something like: ["one", "one", "two"] should match ["two", "one", "one"] , while any addition to the lists or changing the data should not match.

+6

arrays groovy intersect

divillysausages 12 sept '11 at 11:14

source share

1 answer

tim_yates · Accepted Answer · 2011-09-12T11:34:53+0000

If you look at the source for Collection.intersect , you will see that the logic of the method follows this thread:

for two collections, left and right

Swap left and right if left less than right
Add all left to Set (remove duplicates)
For each element in right , if it exists in the leftSet , then add it to the results

So for your last 2 examples;

 def array1 = ["hello", "world", "test", "test"] def array2 = ["world", "world", "world", "test"]

array1.intersect( array2 ) will give (if we wrote the same algorithm in Groovy):

 leftSet = new TreeSet( array1 ) // both same size, so no swap // leftSet = [ 'hello', 'world', 'test' ] right = array2 result = right.findAll { e -> leftSet.contains( e ) }

Which (if you run it), you can see that the result matters [world, world, world, test] (as you found). This is because each element in right can be found in leftSet

I don’t know why the first example should return ["world","world"] , though ...

later...

So, what I think you are looking for would be something like this:

 def array1 = ["hello", "world", "test", "test"] def array2 = ["world", "world", "world", "test"] def intersect1 = array1.intersect( array2 ) as TreeSet def intersect2 = array2.intersect( array1 ) as TreeSet assert intersect1 == intersect2

so that you deal with duplicates in collections, since both intersect1 and intersect2 will be equal

 [test, world]

later

I believe this does what you want:

 [array1,array2]*.groupBy{it}.with { a, b -> assert a == b }

Groovy: Difference with a.intersect (b) and b.intersect (a)

later...

later

More articles: