As most programmers know, the solutions to many problems can be expressed very elegantly using recursion.
However, it is also true that any recursive algorithm can be converted into an iterative one, and
traditional wisdom suggests that iterative algorithms are preferred from the standpoint of performance
and scalability.

To quantify the speed of recursion under Java 6
on Windows XP, I wrote recursive and iterative solutions to a simple tree
problem: summing the values of a specified node and all its descendants.
Although the superior scalability of the iterative solution is a foregone
conclusion, I had expected that the speed of the iterative solution would also
be faster. However, to my surprise, the speed of recursion solution proved to
be significantly faster than the iterative approach.

The source code for five different solutions
are listed below. It is important to note that the recursive solutions
presented below are not tail-recursive, which will preclude the compiler from
optimizing out the recursive method call.

Five Solutions

The first solution is recursive and also uses a Java 5 for-loop to provide code clarity.

Elegant Recursion

/**
* Compute the sum of a tree node and its descendants recursively.
* Use Java 5 syntax for more elegant code but more processing
* overhead.
*
* @param root
* @return
*/
private static int computeSumRecursive_impl2(TreeNode root) {
int result = root.getValue();
for (TreeNode n: root.getChildren())
result += computeSumRecursive_impl2(n);
return result;
}

The second solution does away with the Java 5 syntax sugar in order
to achieve significantly better performance.

Fast Recursion

/**
* Compute the sum of a tree node and its descendants recursively.
* Sacrifice code elegance to maximize efficiency.
*
* @param root
* @return
*/
private static int computeSumRecursive_impl1(TreeNode root) {
int result = root.getValue();
ArrayList children = root.getChildren();
int size = children.size();
for (int j=0; j<size; ++j) {
result += computeSumRecursive_impl1((TreeNode) children.get(j));
}
return result;
}

The third solution is iterative. The elegance of recursion
is clear: the iterative solution cannot be expressed as compactly
and is somewhat harder to follow.

Elegant Iterative

/**
* A stack-based summation routine that focuses on readability.
* @param root
* @return
*/
private static int computeSumStack_impl2(TreeNode root) {
Stack toExamine = new Stack();
toExamine.push(root);
int result=root.getValue();
while (!toExamine.isEmpty())
for (TreeNode n: toExamine.pop().getChildren()) {
result += n.getValue();
toExamine.push(n);
}
return result;
}

The fourth solution is also iterative but eliminates the
Java 5 syntax to improve performance.

Fast Iterative

/**
* A stack-based summation routine that uses loop optimization
* like recursive implementation 1.
* @param root
* @return
*/
private static int computeSumStack_impl1(TreeNode root) {
Stack toExamine = new Stack();
toExamine.push(root);
int result=root.getValue();
while (!toExamine.isEmpty()) {
TreeNode node = toExamine.pop();
ArrayList children = node.getChildren();
int size = children.size();
for (int j=0; j<size; ++j) {
TreeNode child = (TreeNode) children.get(j);
result += child.getValue();
toExamine.push(child);
}
}
return result;
}

The final iterative solution sacrifices maintainability,
readability, and scalability in order to achieve maximum
performance. Solutions like this should be avoided in
real-world situations.

Hardcore Iterative

/**
* A stack-based summation routine that sacrifices readability
* and scalability to achieve maximum performance.
* @param root
* @return
*/
private static int computeSumStack_impl3(TreeNode root) {
final TreeNode[] nodeStack = new TreeNode[100];
int start = 0;
int end = 1;
nodeStack[start] = root;
int result=root.getValue();
while (start != end) {
ArrayList children = nodeStack[start].getChildren();
if (++start == 100)
start = 0;
int size = children.size();
for (int j=0; j<size; ++j) {
TreeNode child = (TreeNode) children.get(j);
result += child.getValue();
if (--start < 0)
start = 99;
nodeStack[start] = child;
}
}
return result;
}

Performance Results and Analysis

All five solutions were tested on a balanced, 100,000 node
tree with a branching factor of five. Each solution was run
against the root of this tree 50 times. Timings were collected
using nanoTime,
and memory usage was captured using the methodology outlined
in
Java Tip 130: Do you know your data size? by Vladimir Roubtsov.
Table 1 below summarizes the results.

Table 1: Results

Solution

Time (ns)

Heap (bytes)

Elegant Recursion

8,422,312

1,108,004

Fast Recursion

3,488,725

0

Elegant Iterative

16,213,494

1,107,893

Fast Iterative

10,181,091

26,707

Hardcore Iterative

4,178,745

26,728

What can we learn from these results? Most surprising is that
recursive performance blows iterative performance out of the water. The
fast recursive solution is roughly 3 times faster than the fast iterative one.
Even the hardcore iterative solution takes roughly 120% the
time of the recursive solution. Clearly Sun has aggressively optimized
method calling and stack access, and, as a result, iterative solutions,
which in theory should run faster than recursive ones, do not, because
the recursive solutions rely on heavily optimized JVM code.

On the other hand, it is also clear that it is easy to inadvertently "give
away" performance by, for example, improving readability using Java 5 for-loop
iteration. Elegant recursion uses over 1MB of heap space and runs almost
as slowly as the fast iterative solution. The elegant iterative solution
uses 160% of the fast solution's time, and uses roughly 40 times more
heap space.

Some readers have questioned "why memory usage of the fast recursive solution
is 0?". The reason is because it uses no heap space, which is what is measured
in the example above. Rather, it relies exclusively on stack space, which
places serious limits on its scalability. Had the tree used in the test not
been well-balanced, the recursive solutions would have failed to run.

Clarifications

I have received a great deal of feedback on this article, and have realized
that some points need clarification.

My intention in including the "elegant" results was not to show that
the Java 5 for-loop syntax is inherently slow. However, many readers came away
with that impression from the article. As many of you know, the for-loop
syntax in Java 5 creates iterators behind the scenes, but, in the example
presented, we can take advantage of the excellent random access capability
of the child ArrayList to achieve better performance. The
situtation is rather similar to autoboxing: if you, as a developer, do not
understand what the compiler is doing for you, you can inadvertantly hurt
application performance.

A number of you have pointed to the fact that Stack is
backed by Vector, a synchronized collection, as the reason
for the slow performance of the iterative implementations. While
it is definitely contributory, it does not account for the entire
performance gap, since even the "hardcore" iterative version, which does not
use any collection classes at all, is slower than the fast recursive
version (at least on Windows XP. Apparently the results
under Java 5 on Ubuntu Linux were different).

Conclusion

As always, comments are welcome. If
you happen to run the program using a different platform configuration,
I would appreciate the results.