Thursday, June 28, 2012

Catch a StackOverFlowError by its tail

One of the more annoying situations you might have to deal with when working with a Java program is a StackOverFlowError, if you have a nice producible test case then there are few options with regard to playing with the stack size, or setting a conditional breakpoint / trace of some kind.

But if you have a test case that might fail once in a 100 times, perhaps a race condition in AWTMulticaster as in my case, then you want to improve the diagnostics. The problem is that by default the VM won't any elements in a stack trace after the first 1024 entries. (At least for JDK 6) So if you run the following trivial example:

package other;

public class Overflow {


   public static final void call(double a, double b, double c, double d) {
       call(a,b,c,d);
   }


   public static void main(String[] args) {
       call(0,0,0,0);
   }

}

The output will stop before you get to the cause of the problem, making it very hard to resolve the issue.

> java other.Overflow 
Exception in thread "main" java.lang.StackOverflowError
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
        [ lots of lines removed ]
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
Process exited with exit code 1.

The good news is that this limit is one of the many official and unofficial things you can tweak when starting a VM.


> java -XX:MaxJavaStackTraceDepth=1000000 other.Overflow 
Exception in thread "main" java.lang.StackOverflowError
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
        [ lots of lines removed ]
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.call(Overflow.java:7)
 at other.Overflow.main(Overflow.java:12)
Process exited with exit code 1.

Notice that end of the stack now contains the root of the problem which will be incredibly useful when trying to diagnose the issue.

You probably don't want to leave this property set long term on a production servers because I am not entirely sure about the impact of it. Looking at the C++ code it appears that this is just a maximum value and won't affect most other stack traces as they are allocated in a linked list in segments of 32 entries.