Tensorflow Object Detection API Has Slow Output Time Using Tensor Flow Function

I cannot match the output time indicated by Google for models released at the model zoo . In particular, I am testing their model faster_rcnn_resnet101_coco , where the indicated output time is 106ms on the Titan X GPU.

My serving system uses TF 1.4, running in a container built from Google's Dockerfile . My client is modeled after the proto tensor takes ~ 150 ms and Predict takes ~ 180 ms. My saved_model.pb is located directly from the tar file downloaded from the model zoo. Is there something I'm missing? What steps can I take to shorten the withdrawal time?

+5
source share
4 answers

I was able to solve two problems with

  • optimization of compiler flags. Added bazel-bin --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma

  • Do not import tf.contrib for each output. In the inception_client example provided by google, these lines reimport tf.contrib for each redirect.

+3
source

Nemax suppression can be a bottleneck: https://github.com/tensorflow/models/issues/2710 .

Is the image size 600x600?

+2
source

I ran a similar model with Titan Xp, however I used the infer_detections.py script and recorded the direct pass time [mainly using the date and time before and after tf_example = detect_inference.infer_detections_and_add_to_example (serialized_example_tensor, detected_boxes_tensor, detected_scores_tensestens_tensestens_tensestens_tensestens_tensestenselstenselstenslstensestensestensestenselstenselstensestensestensestenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenselstenslstenselstenselstenselstenselstenselstensestenselsensibletensorstsensors_tensorsts reduced the number of sentences generated in the first stage of FasterRCN from 300 to 100 and reduced the number of detections in the second stage to 100. I got numbers in the range from 80 to 140 ms, and I think that a 600x600 image will approximately take ~ 106 or a little less in this us the top three (due to the Titan Xp and the reduced complexity of the model). Perhaps you can repeat the above process on your equipment, so if in this case the numbers are also ~ 106 ms, we can explain the difference using the DockerFile and the client. If the numbers are still high, it might be hardware.

It would be helpful if someone from the Tensorflow Object Detection team could comment on the setting used to generate numbers in the model zoo .

0
source

@ Vikram Gupta Have you checked the use of your GPU? Does it reach somewhere around 80-100%? I experience very low use of graphical objects to define video stream objects using the API and model zoo models.

0
source

Source: https://habr.com/ru/post/1274207/


All Articles