I think Eli answered your question, so I'm going to insert my comment here and get a loan for him :).
Eli Barzilai writes:
(a) heap processing takes longer because it requires scanning (it is not linear, like a stack); (b) virtually all cpu architectures focus on providing access to the stack as quickly as possible, rather than the heap.
To this, I would add a general waving of the cache locality. That is, the stack saves all actions in a very small part of the memory, which will almost certainly remain in the cache.
source share