I am trying to run SqueezeDet using tenorflow C ++ api (CPU only). I froze the tensor flow graph and downloaded it from C ++. Although everything is fine in terms of detection quality, performance is much slower than in python. What could be the reason for this?
Simplified, my code is as follows:
int main (int argc, const char * argv[])
{
tensorflow::GraphDef graph_def;
string graph_file_name = "Model/graph.pb";
tensorflow::Status graph_loaded_status = ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
if (!graph_loaded_status.ok())
{
cout << graph_loaded_status.ToString() << endl;
return 1;
}
unique_ptr<tensorflow::Session> session_sqdet(tensorflow::NewSession(tensorflow::SessionOptions()));
tensorflow::Status session_create_status = session_sqdet->Create(graph_def);
if (!session_create_status.ok())
{
cout << "Session create status: fail." << endl;
return 1;
}
while ()
{
session.Run({{ "image_input", input_tensor}, {"keep_prob", prob_tensor}}, {"probability/score", "bbox/trimming/bbox"}, {}, &final_output);
}
}
What I tried:
1) Using optimization flags - all inclusive, no warnings.
2) Using batch processing: performance has increased, but the gap between python and C ++ is still significant (starting a session takes 1s versus 2.4 with batch_size = 20).
Any help would be greatly appreciated.