Remove ArrayList vs removeAll

What is better to use if I want to remove a collection from arraylist? I think the removeAll method in ArrayList was written for this task, but in the test I wrote, just repeating through the objects and removing their individual ones was several seconds faster.

What do you use for this purpose?

edit:

removeAll code I found in grepcode calls batchRemove (c, false):

private boolean Read more ... batchRemove (Collection c, boolean add-on) {

700 final Object[] elementData = this.elementData; 701 int r = 0, w = 0; 702 boolean modified = false; 703 try { 704 for (; r < size; r++) 705 if (c.contains(elementData[r]) == complement) 706 elementData[w++] = elementData[r]; 707 } finally { 708 // Preserve behavioral compatibility with AbstractCollection, 709 // even if c.contains() throws. 710 if (r != size) { 711 System.arraycopy(elementData, r, 712 elementData, w, 713 size - r); 714 w += size - r; 715 } 716 if (w != size) { 717 // clear to let GC do its work 718 for (int i = w; i < size; i++) 719 elementData[i] = null; 720 modCount += size - w; 721 size = w; 722 modified = true; 723 } 724 } 725 return modified; 726 } 

I really don't understand him.

my test code is:

 public class RemoveVsRemovall { public static void main(String[] args){ ArrayList<String> source = new ArrayList<>(); ArrayList<String> toRemove = new ArrayList<>(); for(int i = 0; i < 30000; i++){ String s = String.valueOf(System.nanoTime()); source.add(s); if(i % 2 == 0) toRemove.add(s); } long startTime = System.nanoTime(); removeList1(source, toRemove); long endTime = System.nanoTime(); System.out.println("diff: " + (endTime - startTime) * 1e-9); } static void removeList1(ArrayList<String> source, ArrayList<String> toRemove){ source.removeAll(toRemove); } static void removeList2(ArrayList<String> source, ArrayList<String> toRemove){ for(String s : toRemove){ source.remove(s); } } } 

calling it several times with different list sizes and switching between two ways.

+6
source share
2 answers

There are several reasons why it is difficult to give a general answer to this question.

First, you must understand that these performance characteristics are implementation dependent. It is possible that the implementation depends on the platform and version of the JDK.

Having said that, basically 2 strategies for implementing removeAll implemented:

  • For each element of your ArrayList check if it is in a different Collection ; if so, delete it.
  • For each Collection element, check if it is in an ArrayList ; if so, delete it.

If the Collection execution contains in constant time, strategy 1 wins (asymptotically). On the other hand, if contains is performed by scanning the entire connection and Collection iterates very slowly, strategy 2 usually has an edge because it repeats on Collection only once; but even so, if the Collection very large, and most ArrayList elements are among the first elements of the Collection , strategy 1 wins again ... there is no end to this.

You should probably trust the removeAll() implementation; if this fails, try changing the data structures; and if that fails, run your own method from empirical tests.

+4
source

Another thing:

Java code has been cracked for ages and written to adapt to many different and special cases (see Preserve behavioral compatibility with AbstractCollection comment).

So, in fact, probably you can write your own implementation of methods that will work faster. But, on the other hand, are you sure you can consider all the special cases that Java developers have faced since the birth of Java?

Also note that some Java functions may use some C implementation to speed things up. Apparently, this is not so, but it can.

+2
source

Source: https://habr.com/ru/post/983180/


All Articles