Recently, while working on the design of a high-performance data structure in
Java, I made a startling discovery about the performance of ArrayList.removeAll.
Namely, that the average runtime performance of this method under Sun's Java 6 SDK is
\(O(ns)\), where \(n\) is the size of the array list and \(s\) is the
number of items to remove! Worst case performance is an even more dismal \(O(n^2)\).
This short article explains the problem in detail, and
illustrates two solutions with \(O(n)\) worst-case performance. Finally, a concrete
performance test is presented to justify the theoretical discussion.
The Problem
The problem with ArrayList.removeAll was discovered in Sun's Java 6 SDK.
ArrayList does not directly implement the removeAll method: rather,
it is implemented by its superclass AbstractCollection. The listing below
shows the implementation of that method.
ArrayList.removeAll
1 public boolean removeAll(Collection> c) {
2 boolean modified = false;
3 Iterator<?> e = iterator();
4 while (e.hasNext()) {
5 if (c.contains(e.next())) {
6 e.remove();
7 modified = true;
8 }
9 }
10 return modified;
11 }
This raises red flags because remove operations on array lists are \(O(n)\) operations,
so while the code above is optimal for a linked list, it is definitely not for an
array list. In particular, at every iteration in the loop, it is possible to trigger
a remove, hence the worst case performance is \(O(n^2)\).
Two Solutions
Luckily, correcting the performance of removeAll happens to be rather easy.
Only a single pass through the array is required, and the algorithms are in-place to
boot. The listing below shows a simple re-implementations of removeAll within
the ArrayList class itself.
O(n) removeAll
1 public boolean removeAll(Collection> c) {
2 int x=0, i=0;
3 for (i=0; i<size; i++)
4 if (!c.contains(elementData[i]))
5 elementData[x++] = elementData[i];
6 if (x != i) {
7 Arrays.fill(elementData, x, i, null);
8 size = x;
9 return true;
10 } else {
11 return false;
12 }
13 }
Note that this implementation also avoids unnecessary re-computation of a modified
flag. The standard algorithm recomputes its modified flag for each item removed. This
algorithm would be optimal if it weren't for the fact that most operating systems provide
extremely fast native operations for moving blocks of memory around, and this algorithm
doesn't manage to get in on the action.
To take advantage of fast memory copies, we need to utilize System.arraycopy to
move data around. This comes at the expense of a more complicated implementation.
Tuned O(n) removeAll
1 public boolean removeAll(Collection> c) {
2 int oldHi=0, newHi=0, top=0;
3 for (int i=0; i<size; ++i) {
4 if (c.contains(elementData[i])) {
5 oldHi = newHi;
6 newHi = i;
7
8 // at the end of this loop newHi will be the non-inclusive
9 // upper limit of the range to delete.
10 //
11 while (++newHi<size && c.contains(elementData[newHi]));
12
13 final int length = i - oldHi;
14 System.arraycopy(elementData, oldHi, elementData, top, length);
15 i = newHi;
16 top += length;
17 }
18 }
19 if (newHi > 0) {
20 final int k = size - newHi;
21 System.arraycopy(elementData, newHi, elementData, top, k);
22 final int n = top + k;
23 Arrays.fill(elementData, n, size, null);
24 size = n;
25 return true;
26 } else {
27 return false;
28 }
29 }
The algorithm above represents my best effort at speed. It copies blocks
rather than individual elements, and uses native code to move the data.
Like the previous function, it doesn't waste time recomputing a modification
flag.
Quantifying Performance
To justify the theoretical arguments above, we conclude by running a series
of benchmarks to quantify the performance gains. You are invited to
download these benchmarks and verify the claims yourself. Should you
obtain significantly different results, or if you can provides results for
a different platform (either OS or JDK level), please contact me so I can
post them.
The benchmark is concerned with testing the speed of the default implementation
and the two alternate implementations presented in this article. Five different
tests constitute the benchmark:
Remove from head. Remove N items from the head of a list. This is
a "worst case scenario" for the default implementation.
Remove from tail. Remove N items from the tail of a list. The default
implementation should perform much better in this scenario.
Remove nothing. Remove 0 items from the list. In theory, all three
implementations should show comparable performance on this test since nothing
is being removed, and the entire operation consists of a single scan through
this list's elements.
Remove interleaved from head. Remove N items located at positions 0,
2, 4, ..., N*2 in the list. This test is interesting because we would expect
System.arraycopy to provide minimal, if not negative, benefit since
it never copies more than a single element at a time.
Random. Remove N items at random from the list. The random items
are precomputed and reused for all the implementations. This test is of interest
as a simulation of the "average" real-world use case.
For the test results presented below, N was 150, and the size of the list was 2500.
All tests were run 2500 apiece and the results reported were averages of all runs
(this is a potential source of inaccuracy since GC will skew the results). The
results are presented below:
Table 1: Results
Arrays Tester v1.0
Java(TM) 2 Runtime Environment, Standard Edition
1.6.0-rc-b63
Solution
Time (ns)
Default
Remove from head
630,064
Remove from tail
302,617
Remove nothing
254,418
Remove interleaved from head
623,793
Remove random
382,617
O(n)
Remove from head
186,782
Remove from tail
187,307
Remove nothing
161,992
Remove interleaved from head
186,808
Remove random
186,628
Tuned O(n)
Remove from head
180,952
Remove from tail
185,900
Remove nothing
148,388
Remove interleaved from head
188,290
Remove random
187,949
Several points in the results above are noteworthy. First, it would
be inaccurate to claim that the O(n) solutions are "X times faster"
than the default implementation since the difference in runtime
performance varies as a function of the input size.
Secondly, there is very little difference between the O(n) and tuned
O(n) solutions. In fact, for standard use cases (of which random and
interleaved are most representative), there is a slight performance
loss. Essentially, the problem with the tuned O(n) algorithm is that
it is tuned for a corner case: when a large, contiguous block of
values is to be removed. Therefore, the first solution is probably
preferable, especially when considering the added bonus of its
simplicity.
Corrections
Thanks to Tobias Mattsson for catching a flaw in the initial
implementation of the O(n) algorithms. The problem was,
in his own words: "Don't forget to null out the remaining unused slots
of the underlying array. Neglecting to do so will cause memory
leaks."
Conclusion
Although it is surprising to see an egregious ArrayList.removeAll
implementation in a JDK as mature as Sun's Java 6 JDK, it is understandable
when one considers that J2EE business applications, which constitute the bread
and butter of Java, rarely work with very large arrays, even more rarely
use removeAll, and that, even when both these conditions are true,
performance is virtually guaranteed to be dominated by remote calls,
XML parsing, and other heavyweight activities.
the position appeared to be that optimizing the best case was more
important than optimizing the worst case. Hopefully that position has
changed, since people tend to choose data structures based upon their
worst-case performance.
- Curt
Amin responds:
Hi Curt,
There are a few other tricky things to consider re: the bug you filed,
especially when the collection implementation violates the standard collection
behavior . For example, Imagine if I passed an identity hash set into
retainAll. If the retainAll implementation automatically placed the elements
into a HashSet, you would get unexpected behavior (potentially
too few elements removed).
I was focusing on the fact that the entire tail of the array is shifted
forward for each element that is deleted, which is also very inefficient.
retainAll suffers the same problem, as do ArrayDeque.removeAll and
ArrayDeque.retainAll.
Hi Curt,
There are a few other tricky things to consider re: the bug you filed, especially when the collection implementation violates the standard collection behavior . For example, Imagine if I passed an identity hash set into retainAll. If the retainAll implementation automatically placed the elements into a
HashSet, you would get unexpected behavior (potentially too few elements removed).I was focusing on the fact that the entire tail of the array is shifted forward for each element that is deleted, which is also very inefficient. retainAll suffers the same problem, as do ArrayDeque.removeAll and ArrayDeque.retainAll.
amin