In the past, I used performance profiling tools such as nprof, the Equatec profiler, and Yourkit profiler to identify and remove / reduce performance bottlenecks in code that mostly runs on a single thread (serialized execution). I am currently writing a lot of multithreaded code that can be slowed down by blocking; What tools and tricks can be used to determine where the lock conflict occurs and by how much?
source share