Optimizing ArrayList.removeAll

February 28, 2007

Recently, while working on the design of a high-performance data structure in Java, I made a startling discovery about the performance of ArrayList.removeAll. Namely, that the average runtime performance of this method under Sun's Java 6 SDK is \(O(ns)\), where \(n\) is the size of the array list and \(s\) is the number of items to remove! Worst case performance is an even more dismal \(O(n^2)\). This short article explains the problem in detail, and illustrates two solutions with \(O(n)\) worst-case performance. Finally, a concrete performance test is presented to justify the theoretical discussion.

The Problem

The problem with ArrayList.removeAll was discovered in Sun's Java 6 SDK. ArrayList does not directly implement the removeAll method: rather, it is implemented by its superclass AbstractCollection. The listing below shows the implementation of that method.

ArrayList.removeAll

 1  public boolean removeAll(Collection c) {
 2    boolean modified = false;
 3    Iterator<?> e = iterator();
 4    while (e.hasNext()) {
 5      if (c.contains(e.next())) {
 6        e.remove();
 7        modified = true;
 8      }
 9    }
10    return modified;
11  }

This raises red flags because remove operations on array lists are \(O(n)\) operations, so while the code above is optimal for a linked list, it is definitely not for an array list. In particular, at every iteration in the loop, it is possible to trigger a remove, hence the worst case performance is \(O(n^2)\).

Two Solutions

Luckily, correcting the performance of removeAll happens to be rather easy. Only a single pass through the array is required, and the algorithms are in-place to boot. The listing below shows a simple re-implementations of removeAll within the ArrayList class itself.

O(n) removeAll

 1  public boolean removeAll(Collection c) {
 2  int x=0, i=0;
 3  for (i=0; i<size; i++) 
 4      if (!c.contains(elementData[i])) 
 5          elementData[x++] = elementData[i];
 6  if (x != i) {
 7      Arrays.fill(elementData, x, i, null);
 8      size = x;
 9      return true;
10  } else {
11      return false;
12  }
13  }

Note that this implementation also avoids unnecessary re-computation of a modified flag. The standard algorithm recomputes its modified flag for each item removed. This algorithm would be optimal if it weren't for the fact that most operating systems provide extremely fast native operations for moving blocks of memory around, and this algorithm doesn't manage to get in on the action.

To take advantage of fast memory copies, we need to utilize System.arraycopy to move data around. This comes at the expense of a more complicated implementation.

Tuned O(n) removeAll

 1  public boolean removeAll(Collection c) {
 2  int oldHi=0, newHi=0, top=0;
 3  for (int i=0; i<size; ++i) {
 4      if (c.contains(elementData[i])) {
 5          oldHi = newHi;
 6          newHi = i;
 7          
 8          // at the end of this loop newHi will be the non-inclusive
 9          // upper limit of the range to delete.
10          //
11          while (++newHi<size && c.contains(elementData[newHi]));
12          
13          final int length = i - oldHi;
14          System.arraycopy(elementData, oldHi, elementData, top, length);
15          i = newHi;
16          top += length;
17      }
18  }
19  if (newHi > 0) {
20      final int k = size - newHi;
21      System.arraycopy(elementData, newHi, elementData, top, k);
22      final int n = top + k;
23      Arrays.fill(elementData, n, size, null);
24      size = n;
25      return true;
26  } else {
27      return false;
28  }
29  }

The algorithm above represents my best effort at speed. It copies blocks rather than individual elements, and uses native code to move the data. Like the previous function, it doesn't waste time recomputing a modification flag.

Quantifying Performance

To justify the theoretical arguments above, we conclude by running a series of benchmarks to quantify the performance gains. You are invited to download these benchmarks and verify the claims yourself. Should you obtain significantly different results, or if you can provides results for a different platform (either OS or JDK level), please contact me so I can post them.

The benchmark is concerned with testing the speed of the default implementation and the two alternate implementations presented in this article. Five different tests constitute the benchmark:

Remove from head. Remove N items from the head of a list. This is a "worst case scenario" for the default implementation.
Remove from tail. Remove N items from the tail of a list. The default implementation should perform much better in this scenario.
Remove nothing. Remove 0 items from the list. In theory, all three implementations should show comparable performance on this test since nothing is being removed, and the entire operation consists of a single scan through this list's elements.
Remove interleaved from head. Remove N items located at positions 0, 2, 4, ..., N*2 in the list. This test is interesting because we would expect System.arraycopy to provide minimal, if not negative, benefit since it never copies more than a single element at a time.
Random. Remove N items at random from the list. The random items are precomputed and reused for all the implementations. This test is of interest as a simulation of the "average" real-world use case.

For the test results presented below, N was 150, and the size of the list was 2500. All tests were run 2500 apiece and the results reported were averages of all runs (this is a potential source of inaccuracy since GC will skew the results). The results are presented below:

Table 1: Results

Arrays Tester v1.0
Java(TM) 2 Runtime Environment, Standard Edition
1.6.0-rc-b63

Solution	Time (ns)
Default
Remove from head	630,064
Remove from tail	302,617
Remove nothing	254,418
Remove interleaved from head	623,793
Remove random	382,617
O(n)
Remove from head	186,782
Remove from tail	187,307
Remove nothing	161,992
Remove interleaved from head	186,808
Remove random	186,628
Tuned O(n)
Remove from head	180,952
Remove from tail	185,900
Remove nothing	148,388
Remove interleaved from head	188,290
Remove random	187,949

Several points in the results above are noteworthy. First, it would be inaccurate to claim that the O(n) solutions are "X times faster" than the default implementation since the difference in runtime performance varies as a function of the input size. Secondly, there is very little difference between the O(n) and tuned O(n) solutions. In fact, for standard use cases (of which random and interleaved are most representative), there is a slight performance loss. Essentially, the problem with the tuned O(n) algorithm is that it is tuned for a corner case: when a large, contiguous block of values is to be removed. Therefore, the first solution is probably preferable, especially when considering the added bonus of its simplicity.

Corrections

Thanks to Tobias Mattsson for catching a flaw in the initial implementation of the O(n) algorithms. The problem was, in his own words: "Don't forget to null out the remaining unused slots of the underlying array. Neglecting to do so will cause memory leaks."

Conclusion

Although it is surprising to see an egregious ArrayList.removeAll implementation in a JDK as mature as Sun's Java 6 JDK, it is understandable when one considers that J2EE business applications, which constitute the bread and butter of Java, rarely work with very large arrays, even more rarely use removeAll, and that, even when both these conditions are true, performance is virtually guaranteed to be dominated by remote calls, XML parsing, and other heavyweight activities.

As always, comments are welcome.

Resources

The source code for the benchmarks and optimized array lists.
An RFE related to this issue has been accepted by Sun.
Artima has a discussion about the performance of ArrayList.removeAll on its message boards..

Feedback

A few selected comments.

Curt writes on 3/18/2007:

Nice writeup.

When I filed this: "Collection.retainAll(Collection) implementations are not optimized"

the position appeared to be that optimizing the best case was more important than optimizing the worst case. Hopefully that position has changed, since people tend to choose data structures based upon their worst-case performance.

- Curt

Amin responds:

Hi Curt,

There are a few other tricky things to consider re: the bug you filed, especially when the collection implementation violates the standard collection behavior . For example, Imagine if I passed an identity hash set into retainAll. If the retainAll implementation automatically placed the elements into a HashSet, you would get unexpected behavior (potentially too few elements removed).

I was focusing on the fact that the entire tail of the array is shifted forward for each element that is deleted, which is also very inefficient. retainAll suffers the same problem, as do ArrayDeque.removeAll and ArrayDeque.retainAll.

amin