Well, the method I would use is not easy, but it works. You may have already tried this, but carry me.
I get a time log indicating the time it took to send each message, the time it took to receive each message, and its duration. If it is associated with several processes or threads, each of them generates a log, and then combines them into a common timeline.
Then I will build a graph. (The tool will be enjoyable, but I did it manually). I am looking for things like 1) messages resubmitted due to timeouts, 2) delays between the time the message was received and the time it took to receive it.
This usually identifies problems that I can fix in code that I can control. This improves the situation, but then I do it all over again, because the chances are very good that I missed something for the last time.
The result was that an asynchronous messaging system could be fast enough as soon as preventable sources of delay were eliminated.
There is a tendency in the publication of performance questions in order to look for corrections to improve the situation. But the real magical fix is ββto perfect your diagnostic technique so that it tells you what to fix, because it will be different from others.
source share