Comparing 2 very large arraylists in java

What would be the right approach when you need to compare 2 very large arraists with each other?

These arraylist are 100,000 items in size and, of course, crash when simply comparing item by item.

for (CItem c : cItems) {
        for (CItem r : rItems) {
            if (c.getID().equals(r.getID())) {
                Mismatch m = compareItems(c, r);
                if (m != null) {
                    mismatches.add(m);
                }
            }
        }
    }

Now I'm not 100% sure how garbage collection works in this situation, but we get errors:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[na:1.8.0_73]
at java.lang.String.<init>(String.java:207) ~[na:1.8.0_73]
at java.lang.StringBuilder.toString(StringBuilder.java:407) ~[na:1.8.0_73]

and

java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3181) ~[na:1.8.0_73]
at java.util.ArrayList.grow(ArrayList.java:261) ~[na:1.8.0_73]
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235) ~[na:1.8.0_73]
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227) ~[na:1.8.0_73]
at java.util.ArrayList.add(ArrayList.java:458) ~[na:1.8.0_73]

Possible solutions so far

  • Divide each list by a maximum of x elements and compare these multiple lists (something like complex)
  • Create a new database and query each item (which will be very slow and not feasible right now)
  • Buy 200 GB RAM

Any data on this will be appreciated.

+1
source share
5

, Map rItems ID as.

Map<Long, CItem> rItemMap = new HashMap<>(rItems.size());
for (CItem r : rItems) {
    rItemMap.put(r.getID(), r);
}

rItems ID:

for (CItem c : cItems) {
    CItem r = rItemMap.get(c.getID());
    if (r != null) {
        Mismatch m = compareItems(c, r);
        if (m != null) {
            mismatches.add(m);
        }
    }
}

, , , Map.Entry, .

OutOfMemory

Exception, ArrayList. LinkedList , ArrayList ( ), , .

, 1000 , , . 2000. 3000 ( 1000 ).

LinkedList ( ).

+3

, , 2 ID , .

, 100.000 x 100.000 . , ...

1) , ArrayList(). , . ( , )

2) 2 ArrayList() , , , . (, ID), . ( ). .

+3

removeAll :)

rItems.removeAll(cItems);

, ...

, .

+1

2 , . O(n log n) O(n).

Comparator<CItem> idComparator = new Comparator<CItem>() {
    @Override
    public int compare(CItem i1, CItem i2) {
        // Implementation depends on the type of CItem ID:
        // if ID is an integer or double, maybe you need
        // return i1.getID() - i2.getID();
        return i1.getID().compareTo(i2.getID());
    }
});

Collections.sort(cItems, idComparator);
Collections.sort(rItems, idComparator);

int minLen = Math.min(cItems.size(), rItems.size());
for (int i = 0, j = 0; i < minLen && j < minLen; ) {

    CItem c = cItems.get(i);
    CItem r = rItems.get(j);

    // c.getID().equals(r.getID())
    if (idComparator.compare(c, r) == 0) {
        Mismatch m = compareItems(c, r);
        if (m != null) {
            mismatches.add(m);
        }
        i++;
        j++;

    // item c ID does not exist in list rItems
    } else if (idComparator.compare(c, r) < 0) {
        i++;

    // item r ID does not exist in list cItems
    } else {
        j++;
    }
}
+1

. LinkedList. 2 Linkedlist, 3,5 .

LinkedList diflist= (LinkedList) ListUtils.subtract(sourceList, targetList);

, .

, ?

0

Source: https://habr.com/ru/post/1667158/


All Articles