Run some tests with a hosted approach and build a new HashSet. That is, let A be the smallest of the sets, and B be large, and then for each element in A , if it also exists in B, then add it to C (new HashSet) - for a simple calculation, the intermediate set C can be skipped.
Like the hosted approach, this should be O(|A|) in value, since the iteration is O(|A|) , and the probe in B is O(1) . I have no idea how this will be compared to the clone and delete method.
Happy coding - and post some results; -)
Actually, upon further reflection, I think this is slightly better than the method in the message: O(|A|) vs O(|A| + |B|) . I donβt know if this will really make any difference (or improvement), and I would only expect it to be relevant when |A| <<< |B| |A| <<< |B| .
Okay, I was very bored. At least on JDK 7 (Windows 7 x64), the method presented in the post is slower than the above approach - a good (albeit mostly constant) factor. My guesstimate eyeball says it's about four times slower than the suggestion above, which just uses a counter and twice as slow when creating a new HashSet. It seems "roughly matched" in different initial sizes.
(Please keep in mind that, as Voo pointed out, the numbers above and this micro benchmark suggest using a HashSet! And, as always, there are dangers with micro benchmarks. YMMV.)
Here are the ugly results (times in milliseconds):
Running tests for 1x1
IntersectTest $ PostMethod @ 6cc2060e took 13.9808544 count = 1000000
IntersectTest $ MyMethod1 @ 7d38847d took 2.9893732 count = 1000000
IntersectTest $ MyMethod2 @ 9826ac5 took 7.775945 count = 1000000
Running tests for 1x10
IntersectTest $ PostMethod @ 67fc9fee took 12.4647712 count = 734000
IntersectTest $ MyMethod1 @ 7a67f797 took 3.1567252 count = 734000
IntersectTest $ MyMethod2 @ 3fb01949 took 6.483941 count = 734000
Running tests for 1x100
IntersectTest $ PostMethod @ 16675039 took 11.3069326 count = 706000
IntersectTest $ MyMethod1 @ 58c3d9ac took 2.3482693 count = 706000
IntersectTest $ MyMethod2 @ 2207d8bb took 4.8687103 count = 706000
Running tests for 1x1000
IntersectTest $ PostMethod @ 33d626a4 took 10.28656 count = 729000
IntersectTest $ MyMethod1 @ 3082f392 took 2.3478658 count = 729000
IntersectTest $ MyMethod2 @ 65450f1f took 4.109205 count = 729000
Running tests for 10x2
IntersectTest $ PostMethod @ 55c4d594 took 10.4137618 count = 736000
IntersectTest $ MyMethod1 @ 6da21389 took 2.374206 count = 736000
IntersectTest $ MyMethod2 @ 2bb0bf9a took 4.9802039 count = 736000
Running tests for 10x10
IntersectTest $ PostMethod @ 7930ebb took 25.811083 count = 4370000
IntersectTest $ MyMethod1 @ 47ac1adf took 6.9409306 count = 4370000
IntersectTest $ MyMethod2 @ 74184b3b took 14.2603248 count = 4370000
Running tests for 10x100
IntersectTest $ PostMethod @ 7f423820 took 25.0577691 count = 4251000
IntersectTest $ MyMethod1 @ 5472fe25 took 6.1376042 count = 4251000
IntersectTest $ MyMethod2 @ 498b5a73 took 13.9880385 count = 4251000
Running tests for 10x1000
IntersectTest $ PostMethod @ 3033b503 took 25.0312716 count = 4138000
IntersectTest $ MyMethod1 @ 12b0f0ae took 6.0932898 count = 4138000
IntersectTest $ MyMethod2 @ 1e893918 took 13.8332505 count = 4138000
Running tests for 100x1
IntersectTest $ PostMethod @ 6366de01 took 9.4531628 count = 700000
IntersectTest $ MyMethod1 @ 767946a2 took 2.4284762 count = 700000
IntersectTest $ MyMethod2 @ 140c7272 took 4.7580235 count = 700000
Running tests for 100x10
IntersectTest $ PostMethod @ 3351e824 took 24.9788668 count = 4192000
IntersectTest $ MyMethod1 @ 465fadce took 6.1462852 count = 4192000
IntersectTest $ MyMethod2 @ 338bd37a took 13.1742654 count = 4192000
Running tests for 100x100
IntersectTest $ PostMethod @ 297630d5 took 193.0121077 count = 41047000
IntersectTest $ MyMethod1 @ e800537 took 45.2652397 count = 41047000
IntersectTest $ MyMethod2 @ 76d66550 took 120.8494766 count = 41047000
Running tests for 100x1000
IntersectTest $ PostMethod @ 33576738 took 199.6269531 count = 40966000
IntersectTest $ MyMethod1 @ 2f39a7dd took 45.5255814 count = 40966000
IntersectTest $ MyMethod2 @ 723bb663 took 122.1704975 count = 40966000
Running tests for 1x1
IntersectTest $ PostMethod @ 35e3bdb5 took 9.5598373 count = 1000000
IntersectTest $ MyMethod1 @ 7abbd1b6 took 2.6359174 count = 1000000
IntersectTest $ MyMethod2 @ 40c542ad took 6.1091794 count = 1000000
Running tests for 1x10
IntersectTest $ PostMethod @ 3c33a0c5 took 9.4648528 count = 733000
IntersectTest $ MyMethod1 @ 61800463 took 2.302116 count = 733000
IntersectTest $ MyMethod2 @ 1ba03197 took 5.4803628 count = 733000
Running tests for 1x100
IntersectTest $ PostMethod @ 71b8da5 took 9.4971057 count = 719000
IntersectTest $ MyMethod1 @ 21f04f48 took 2.2983538 count = 719000
IntersectTest $ MyMethod2 @ 27e51160 took 5.3926902 count = 719000
Running tests for 1x1000
IntersectTest $ PostMethod @ 2fe7106a took 9.4702331 count = 692000
IntersectTest $ MyMethod1 @ 6ae6b7b7 took 2.3013066 count = 692000
IntersectTest $ MyMethod2 @ 51278635 took 5.4488882 count = 692000
Running tests for 10x2
IntersectTest $ PostMethod @ 223b2d85 took 9.5660879 count = 743000
IntersectTest $ MyMethod1 @ 5b298851 took 2.3481445 count = 743000
IntersectTest $ MyMethod2 @ 3b4ac99 took 4.8268489 count = 743000
Running tests for 10x10
IntersectTest $ PostMethod @ 51c700a0 took 23.0709476 count = 4326000
IntersectTest $ MyMethod1 @ 5ffa3251 took 5.5460785 count = 4326000
IntersectTest $ MyMethod2 @ 22fd9511 took 13.4853948 count = 4326000
Running tests for 10x100
IntersectTest $ PostMethod @ 46b49793 took 25.1295491 count = 4256000
IntersectTest $ MyMethod1 @ 7a4b5828 took 5.8520418 count = 4256000
IntersectTest $ MyMethod2 @ 6888e8d1 took 14.0856942 count = 4256000
Running tests for 10x1000
IntersectTest $ PostMethod @ 5339af0d took 25.1752685 count = 4158000
IntersectTest $ MyMethod1 @ 7013a92a took 5.7978328 count = 4158000
IntersectTest $ MyMethod2 @ 1ac73de2 took 13.8914112 count = 4158000
Running tests for 100x1
IntersectTest $ PostMethod @ 3d1344c8 took 9.5123442 count = 717000
IntersectTest $ MyMethod1 @ 3c08c5cb took 2.34665 count = 717000
IntersectTest $ MyMethod2 @ 63f1b137 took 4.907277 count = 717000
Running tests for 100x10
IntersectTest $ PostMethod @ 71695341 took 24.9830339 count = 4180000
IntersectTest $ MyMethod1 @ 39d90a92 took 5.8467864 count = 4180000
IntersectTest $ MyMethod2 @ 584514e9 took 13.2197964 count = 4180000
Running tests for 100x100
IntersectTest $ PostMethod @ 21b8dc91 took 195.1796213 count = 41060000
IntersectTest $ MyMethod1 @ 6f98c4e2 took 44.5775162 count = 41060000
IntersectTest $ MyMethod2 @ 16a60aab took 121.1754402 count = 41060000
Running tests for 100x1000
IntersectTest $ PostMethod @ 20b37d62 took 200.973133 count = 40940000
IntersectTest $ MyMethod1 @ 67ecbdb3 took 45.4832226 count = 40940000
IntersectTest $ MyMethod2 @ 679a6812 took 121.791293 count = 40940000
Running tests for 1x1
IntersectTest $ PostMethod @ 237aa07b took 9.2210288 count = 1000000
IntersectTest $ MyMethod1 @ 47bdfd6f took 2.3394042 count = 1000000
IntersectTest $ MyMethod2 @ a49a735 took 6.1688936 count = 1000000
Running tests for 1x10
IntersectTest $ PostMethod @ 2b25ffba took 9.4103967 count = 736000
IntersectTest $ MyMethod1 @ 4bb82277 took 2.2976994 count = 736000
IntersectTest $ MyMethod2 @ 25ded977 took 5.3310813 count = 736000
Running tests for 1x100
IntersectTest $ PostMethod @ 7154a6d5 took 9.3818786 count = 704000
IntersectTest $ MyMethod1 @ 6c952413 took 2.3014931 count = 704000
IntersectTest $ MyMethod2 @ 33739316 took 5.3307998 count = 704000
Running tests for 1x1000
IntersectTest $ PostMethod @ 58334198 took 9.3831841 count = 736000
IntersectTest $ MyMethod1 @ d178f65 took 2.3071236 count = 736000
IntersectTest $ MyMethod2 @ 5c7369a took 5.4062184 count = 736000
Running tests for 10x2
IntersectTest $ PostMethod @ 7c4bdf7c took 9.4040537 count = 735000
IntersectTest $ MyMethod1 @ 593d85a4 took 2.3584088 count = 735000
IntersectTest $ MyMethod2 @ 5610ffc1 took 4.8318229 count = 735000
Running tests for 10x10
IntersectTest $ PostMethod @ 46bd9fb8 took 23.004925 count = 4331000
IntersectTest $ MyMethod1 @ 4b410d50 took 5.5678172 count = 4331000
IntersectTest $ MyMethod2 @ 1bd125c9 took 14.6517184 count = 4331000
Running tests for 10x100
IntersectTest $ PostMethod @ 75d6ecff took 25.0114913 count = 4223000
IntersectTest $ MyMethod1 @ 716195c9 took 5.798676 count = 4223000
IntersectTest $ MyMethod2 @ 3db0f946 took 13.8064737 count = 4223000
Running tests for 10x1000
IntersectTest $ PostMethod @ 761d8e2a took 25.1910652 count = 4292000
IntersectTest $ MyMethod1 @ e60a3fb took 5.8621189 count = 4292000
IntersectTest $ MyMethod2 @ 6aadbb1c took 13.8150282 count = 4292000
Running tests for 100x1
IntersectTest $ PostMethod @ 48a50a6e took 9.4141906 count = 736000
IntersectTest $ MyMethod1 @ 4b4fe104 took 2.3507252 count = 736000
IntersectTest $ MyMethod2 @ 693df43c took 4.7506854 count = 736000
Running tests for 100x10
IntersectTest $ PostMethod @ 4f7d29df took 24.9574096 count = 4219000
IntersectTest $ MyMethod1 @ 2248183e took 5.8628954 count = 4219000
IntersectTest $ MyMethod2 @ 2b2fa007 took 12.9836817 count = 4219000
Running tests for 100x100
IntersectTest $ PostMethod @ 545d7b6a took 193.2436192 count = 40987000
IntersectTest $ MyMethod1 @ 4551976b took 44.634367 count = 40987000
IntersectTest $ MyMethod2 @ 6fac155a took 119.2478037 count = 40987000
Running tests for 100x1000
IntersectTest $ PostMethod @ 7b6c238d took 200.4385174 count = 40817000
IntersectTest $ MyMethod1 @ 78923d48 took 45.6225227 count = 40817000
IntersectTest $ MyMethod2 @ 48f57fcf took 121.0602757 count = 40817000
Running tests for 1x1
IntersectTest $ PostMethod @ 102c79f4 took 9.0931408 count = 1000000
IntersectTest $ MyMethod1 @ 57fa8a77 took 2.3309466 count = 1000000
IntersectTest $ MyMethod2 @ 198b7c1 took 5.7627226 count = 1000000
Running tests for 1x10
IntersectTest $ PostMethod @ 8c646d0 took 9.3208571 count = 726000
IntersectTest $ MyMethod1 @ 11530630 took 2.3123797 count = 726000
IntersectTest $ MyMethod2 @ 61bb4232 took 5.405318 count = 726000
Running tests for 1x100
IntersectTest $ PostMethod @ 1a137105 took 9.387384 count = 710000
IntersectTest $ MyMethod1 @ 72610ca2 took 2.2938749 count = 710000
IntersectTest $ MyMethod2 @ 41849a58 took 5.3865938 count = 710000
Running tests for 1x1000
IntersectTest $ PostMethod @ 100001c8 took 9.4289031 count = 696000
IntersectTest $ MyMethod1 @ 7074f9ac took 2.2977923 count = 696000
IntersectTest $ MyMethod2 @ fb3c4e2 took 5.3724119 count = 696000
Running tests for 10x2
IntersectTest $ PostMethod @ 52c638d6 took 9.4074124 count = 775000
IntersectTest $ MyMethod1 @ 53bd940e took 2.3544881 count = 775000
IntersectTest $ MyMethod2 @ 43434e15 took 4.9228549 count = 775000
Running tests for 10x10
IntersectTest $ PostMethod @ 73b7628d took 23.2110252 count = 4374000
IntersectTest $ MyMethod1 @ ca75255 took 5.5877838 count = 4374000
IntersectTest $ MyMethod2 @ 3d0e50f0 took 13.5902641 count = 4374000
Running tests for 10x100
IntersectTest $ PostMethod @ 6d6bbbaba9 took 25.1999918 count = 4227000
IntersectTest $ MyMethod1 @ 3bed8c5e took 5.7879144 count = 4227000
IntersectTest $ MyMethod2 @ 689a8e0e took 13.9617882 count = 4227000
Running tests for 10x1000
IntersectTest $ PostMethod @ 3da3b0a2 took 25.1627329 count = 4222000
IntersectTest $ MyMethod1 @ 45a17b4b took 5.8319523 count = 4222000
IntersectTest $ MyMethod2 @ 6ca59ca3 took 13.8885479 count = 4222000
Running tests for 100x1
IntersectTest $ PostMethod @ 360202a5 took 9.5115367 count = 705000
IntersectTest $ MyMethod1 @ 3dfbba56 took 2.3470254 count = 705000
IntersectTest $ MyMethod2 @ 598683e4 took 4.8955489 count = 705000
Running tests for 100x10
IntersectTest $ PostMethod @ 21426d0d took 25.8234298 count = 4231000
IntersectTest $ MyMethod1 @ 1005818a took 5.8832067 count = 4231000
IntersectTest $ MyMethod2 @ 597b933d took 13.3676148 count = 4231000
Running tests for 100x100
IntersectTest $ PostMethod @ 6d59b81a took 193.676662 count = 41015000
IntersectTest $ MyMethod1 @ 1d45eb0c took 44.6519088 count = 41015000
IntersectTest $ MyMethod2 @ 594a6fd7 took 119.1646115 count = 41015000
Running tests for 100x1000
IntersectTest $ PostMethod @ 6d57d9ac took 200.1651432 count = 40803000
IntersectTest $ MyMethod1 @ 2293e349 took 45.5311168 count = 40803000
IntersectTest $ MyMethod2 @ 1b2edf5b took 120.1697135 count = 40803000
And here's an ugly (and possibly erroneous) micro-test:
import java.util.*; public class IntersectTest { static Random rng = new Random(); static abstract class RunIt { public long count; public long nsTime; abstract int Run (Set<Long> s1, Set<Long> s2); } // As presented in the post static class PostMethod extends RunIt { public int Run(Set<Long> set1, Set<Long> set2) { boolean set1IsLarger = set1.size() > set2.size(); Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1); cloneSet.retainAll(set1IsLarger ? set1 : set2); return cloneSet.size(); } } // No intermediate HashSet static class MyMethod1 extends RunIt { public int Run (Set<Long> set1, Set<Long> set2) { Set<Long> a; Set<Long> b; if (set1.size() <= set2.size()) { a = set1; b = set2; } else { a = set2; b = set1; } int count = 0; for (Long e : a) { if (b.contains(e)) { count++; } } return count; } } // With intermediate HashSet static class MyMethod2 extends RunIt { public int Run (Set<Long> set1, Set<Long> set2) { Set<Long> a; Set<Long> b; Set<Long> res = new HashSet<Long>(); if (set1.size() <= set2.size()) { a = set1; b = set2; } else { a = set2; b = set1; } for (Long e : a) { if (b.contains(e)) { res.add(e); } } return res.size(); } } static Set<Long> makeSet (int count, float load) { Set<Long> s = new HashSet<Long>(); for (int i = 0; i < count; i++) { s.add((long)rng.nextInt(Math.max(1, (int)(count * load)))); } return s; } // really crummy ubench stuff public static void main(String[] args) { int[][] bounds = { {1, 1}, {1, 10}, {1, 100}, {1, 1000}, {10, 2}, {10, 10}, {10, 100}, {10, 1000}, {100, 1}, {100, 10}, {100, 100}, {100, 1000}, }; int totalReps = 4; int cycleReps = 1000; int subReps = 1000; float load = 0.8f; for (int tc = 0; tc < totalReps; tc++) { for (int[] bound : bounds) { int set1size = bound[0]; int set2size = bound[1]; System.out.println("Running tests for " + set1size + "x" + set2size); ArrayList<RunIt> allRuns = new ArrayList<RunIt>( Arrays.asList( new PostMethod(), new MyMethod1(), new MyMethod2())); for (int r = 0; r < cycleReps; r++) { ArrayList<RunIt> runs = new ArrayList<RunIt>(allRuns); Set<Long> set1 = makeSet(set1size, load); Set<Long> set2 = makeSet(set2size, load); while (runs.size() > 0) { int runIdx = rng.nextInt(runs.size()); RunIt run = runs.remove(runIdx); long start = System.nanoTime(); int count = 0; for (int s = 0; s < subReps; s++) { count += run.Run(set1, set2); } long time = System.nanoTime() - start; run.nsTime += time; run.count += count; } } for (RunIt run : allRuns) { double sec = run.nsTime / (10e6); System.out.println(run + " took " + sec + " count=" + run.count); } } } } }