This example shows how to profile tensor flow programs. I used this tool to profile my program, a simple LSTM. And the results are shown as:
/gpu:0/stream:all Compute(pid 5)

/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)

My question is:
a) what is the meaning of each line.
b) Especially what is the difference between /gpu:0/stream:all Compute(pid 5)and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3).
c) Why are they running time is different, namely 0.072msand 0.094ms.
source
share