I don't know any good tools, but as a last resort, you could include some code in your application in your code, as shown below:
__thread void* stack_start;
__thread long stack_max_size = 0L;
void check_stack_size() {
char nowhere;
void* stack_end = (void*)&nowhere;
long stack_size = (long)stack_start - (long)stack_end;
if (stack_size > stack_max_size)
stack_max_size = stack_size;
}
The check_stack_size () function should be called in some of the most deeply nested functions.
Then, as the last statement in the stream, you can output stack_max_size somewhere.
stack_start :
void thread_proc() {
char nowhere;
stack_start = (void*)&nowhere;
}