What would be the right approach when you need to compare 2 very large arraists with each other?
These arraylist are 100,000 items in size and, of course, crash when simply comparing item by item.
for (CItem c : cItems) {
for (CItem r : rItems) {
if (c.getID().equals(r.getID())) {
Mismatch m = compareItems(c, r);
if (m != null) {
mismatches.add(m);
}
}
}
}
Now I'm not 100% sure how garbage collection works in this situation, but we get errors:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[na:1.8.0_73]
at java.lang.String.<init>(String.java:207) ~[na:1.8.0_73]
at java.lang.StringBuilder.toString(StringBuilder.java:407) ~[na:1.8.0_73]
and
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3181) ~[na:1.8.0_73]
at java.util.ArrayList.grow(ArrayList.java:261) ~[na:1.8.0_73]
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235) ~[na:1.8.0_73]
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227) ~[na:1.8.0_73]
at java.util.ArrayList.add(ArrayList.java:458) ~[na:1.8.0_73]
Possible solutions so far
- Divide each list by a maximum of x elements and compare these multiple lists (something like complex)
- Create a new database and query each item (which will be very slow and not feasible right now)
- Buy 200 GB RAM
Any data on this will be appreciated.
source
share