I had this problem a while ago. For richly related classes, even if you can complete serialization without, serialization is slow. When we solved this problem, we had several classes, so we just created our own serialization format, which packed the data into a set of integer identifiers, with integer field identifiers for each field and described their connections using a series of object identifiers, field identifier, others Mapping object identifiers. This user approach was very fast and extremely easy to remember, but actually only works if you have a small set of classes that you want to serialize.
The general case is much more complicated, and the demand for serializing rich classes is not so strong, so I would have guessed why no one solved it.
Mostly you are up to the problem, but you always need the stack depth equal to the maximum height of the depth tree of the first search, and therefore at any time when your graph is deeper, you will get a stack overflow. This is a fundamentally recursive problem, so you have to either use recursion or fake recursion by moving the stack distributions to the Stack object that you are putting in the heap. I would look at the OpenJDK implementation:
http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/java/io/ObjectOutputStream.java
You already have DebugTraceInfoStack, I would create a second Stack field for the current object you are writing, and change the writeObject0 method to push the object onto the stack, something like this:
stack.push(obj); while(!stack.empty()) { obj = stack.pop(); ...
Then you just change all the calls to writeObject0 (x); to stack.push (x) ;. The simple standard conversion between recursion and iteration, apart from the class, is nearly 2500 lines, and there are probably tons of gotchas.
If you still build it, I would defend the possibility of sending it as a patch to the next version of java, since it would be useful, for example, IterativeObjectOutputStream for use on plots of deep objects.