A quick and dirty way to profile code

What method do you use when you want to get performance data from specific code codes?

+12
c ++ performance profiling code-snippets
Sep 14 '08 at 11:24
source share
7 answers

This method has several limitations, but I still find it very useful. I will list the restrictions (I know) in advance and allow those who want to use it to do this at their own peril and risk.

  • In the original version, I published the correspondence time spent on recursive calls (as indicated in the comments to the answer).
  • It was not thread safe, it was not thread safe before I added code to ignore recursion, and now it is even less thread safe.
  • Despite the fact that it is very effective if it is called many times (millions), it will have a measurable effect on the result, so the measurement areas that you measure take longer than those that you don’t have.



I use this class when the problem does not justify profiling all my code, or I get some data from the profiler that I want to check. Basically, it sums up the time spent in a certain block, and at the end of the program displays it in the debug stream (visible with DbgView ), including how many times the code was executed (and the average time spent, of course)).

#pragma once #include <tchar.h> #include <windows.h> #include <sstream> #include <boost/noncopyable.hpp> namespace scope_timer { class time_collector : boost::noncopyable { __int64 total; LARGE_INTEGER start; size_t times; const TCHAR* name; double cpu_frequency() { // cache the CPU frequency, which doesn't change. static double ret = 0; // store as double so devision later on is floating point and not truncating if (ret == 0) { LARGE_INTEGER freq; QueryPerformanceFrequency(&freq); ret = static_cast<double>(freq.QuadPart); } return ret; } bool in_use; public: time_collector(const TCHAR* n) : times(0) , name(n) , total(0) , start(LARGE_INTEGER()) , in_use(false) { } ~time_collector() { std::basic_ostringstream<TCHAR> msg; msg << _T("scope_timer> ") << name << _T(" called: "); double seconds = total / cpu_frequency(); double average = seconds / times; msg << times << _T(" times total time: ") << seconds << _T(" seconds ") << _T(" (avg ") << average <<_T(")\n"); OutputDebugString(msg.str().c_str()); } void add_time(__int64 ticks) { total += ticks; ++times; in_use = false; } bool aquire() { if (in_use) return false; in_use = true; return true; } }; class one_time : boost::noncopyable { LARGE_INTEGER start; time_collector* collector; public: one_time(time_collector& tc) { if (tc.aquire()) { collector = &tc; QueryPerformanceCounter(&start); } else collector = 0; } ~one_time() { if (collector) { LARGE_INTEGER end; QueryPerformanceCounter(&end); collector->add_time(end.QuadPart - start.QuadPart); } } }; } // Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number) #define TIME_THIS_SCOPE(name) \ static scope_timer::time_collector st_time_collector_##name(_T(#name)); \ scope_timer::one_time st_one_time_##name(st_time_collector_##name) 
+11
Sep 14 '08 at 11:24
source share

Well, I have two pieces of code. In pseudocode they look like (simplified version, I use QueryPerformanceFrequency in fact):

The first fragment:

 Timer timer = new Timer timer.Start 

Second snippet:

 timer.Stop show elapsed time 

Some Kung Fu hotkeys, and I can tell how long this piece of code stole from my processor.

+2
Sep 14 '08 at 11:26
source share

I make my profiles by creating two classes: cProfile and cProfileManager .

cProfileManager will contain all the data received as a result of cProfile .

cProfile have the following requirements:

  • cProfile has a constructor that initializes the current time.
  • cProfile has a deconstructor that sends the total class lifetime to cProfileManager

To use these profile classes, I first make an instance of cProfileManager . Then I put the block of code that I want to profile inside the curly braces. Inside the curly braces, I create an instance of cProfile . When the code block ends, cProfile will send the time required to complete the code to the cProfileManager .

Code Example Here is a code sample (simplified):

 class cProfile { cProfile() { TimeStart = GetTime(); }; ~cProfile() { ProfileManager->AddProfile (GetTime() - TimeStart); } float TimeStart; } 

To use cProfile , I would do something like this:

 int main() { printf("Start test"); { cProfile Profile; Calculate(); } ProfileManager->OutputData(); } 

or that:

 void foobar() { cProfile ProfileFoobar; foo(); { cProfile ProfileBarCheck; while (bar()) { cProfile ProfileSpam; spam(); } } } 

Technical note

This code is actually an abuse of how the scope, constructors and deconstructors work in C ++ . cProfile exists only inside the block area (the block of code we want to check). When the program leaves the block area, cProfile writes the result.

Additional improvements

  • You can add a string parameter to the constructor so you can do something like this: cProfile Profile ("Profile for complex calculations");

  • You can use a macro to make the code cleaner (be careful not to abuse it. Unlike other abuses in the language, macros can be dangerous to use).

    Example:

    #define START_PROFILE cProfile Profile (); {#define END_PROFILE}

  • cProfileManager can check how many times a code block is called. But you need a code block id. The first improvement may help identify the block. This can be useful in cases where the code you want to profile is inside the loop (for example, the second aboe example). You can also add the average, fastest, and longest code execution time.

  • Remember to add validation to skip profiling if you are in debug mode.

+2
Sep 14 '08 at 12:37
source share

Please note that everything is written specifically for Windows.

I also have a timer class that I wrote for quick and dirty profiling, which uses QueryPerformanceCounter () to get high-precision timings, but with slight differences. My timer class does not reset the elapsed time when the Timer object falls out of scope. Instead, he accumulates past tense in the collection. I added a static member function, Dump (), which creates a table of elapsed times sorted by time category (specified in the timer constructor as a string), as well as some statistical analysis such as average elapsed time, standard deviation, max and min. I also added the static Clear () function, which clears the collection and allows you to start over.

How to use the Timer class (psudocode):

 int CInsertBuffer::Read(char* pBuf) { // TIMER NOTES: Avg Execution Time = ~1 ms Timer timer("BufferRead"); : : return -1; } 

Output Example:

 Timer Precision = 418.0095 ps === Item Trials Ttl Time Avg Time Mean Time StdDev === AddTrade 500 7 ms 14 us 12 us 24 us BufferRead 511 1:19.25 0.16 s 621 ns 2.48 s BufferWrite 516 511 us 991 ns 482 ns 11 us ImportPos Loop 1002 18.62 s 19 ms 77 us 0.51 s ImportPosition 2 18.75 s 9.38 s 16.17 s 13.59 s Insert 515 4.26 s 8 ms 5 ms 27 ms recv 101 18.54 s 0.18 s 2603 ns 1.63 s 

Timer.inl file:

 #include <map> #include "x:\utils\stlext\stringext.h" #include <iterator> #include <set> #include <vector> #include <numeric> #include "x:\utils\stlext\algorithmext.h" #include <math.h> class Timer { public: Timer(const char* name) { label = std::safe_string(name); QueryPerformanceCounter(&startTime); } virtual ~Timer() { QueryPerformanceCounter(&stopTime); __int64 clocks = stopTime.QuadPart-startTime.QuadPart; double elapsed = (double)clocks/(double)TimerFreq(); TimeMap().insert(std::make_pair(label,elapsed)); }; static std::string Dump(bool ClipboardAlso=true) { static const std::string loc = "Timer::Dump"; if( TimeMap().empty() ) { return "No trials\r\n"; } std::string ret = std::formatstr("\r\n\r\nTimer Precision = %s\r\n\r\n", format_elapsed(1.0/(double)TimerFreq()).c_str()); // get a list of keys typedef std::set<std::string> keyset; keyset keys; std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key()); size_t maxrows = 0; typedef std::vector<std::string> strings; strings lines; static const size_t tabWidth = 9; std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2, "Item", tabWidth, tabWidth, "Trials", tabWidth, tabWidth, "Ttl Time", tabWidth, tabWidth, "Avg Time", tabWidth, tabWidth, "Mean Time", tabWidth, tabWidth, "StdDev"); ret += std::formatstr("\r\n%s\r\n", head.c_str()); if( ClipboardAlso ) lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n"); // dump the values for each key {for( keyset::iterator key = keys.begin(); keys.end() != key; ++key ) { time_type ttl = 0; ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key)); size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key)); if( num > maxrows ) maxrows = num; time_type avg = ttl / num; // compute mean std::vector<time_type> sortedTimes; std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key)); std::sort(sortedTimes.begin(), sortedTimes.end()); size_t mid = (size_t)floor((double)num/2.0); double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid]; // compute variance double sum = 0.0; if( num > 1 ) { for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt ) sum += pow(*timeIt-mean,2.0); } // compute std dev double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0; ret += std::formatstr(" %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s\r\n", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str()); if( ClipboardAlso ) lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s\r\n", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str())); } } ret += std::formatstr("%s\r\n", std::string(head.length(),'=').c_str()); if( ClipboardAlso ) { // dump header row of data block lines.push_back(""); { std::string s; for( keyset::iterator key = keys.begin(); key != keys.end(); ++key ) { if( key != keys.begin() ) s.append("\t"); s.append(*key); } s.append("\r\n"); lines.push_back(s); } // blow out the flat map of time values to a seperate vector of times for each key typedef std::map<std::string, std::vector<time_type> > nodematrix; nodematrix nodes; for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time ) nodes[time->first].push_back(time->second); // dump each data point for( size_t row = 0; row < maxrows; ++row ) { std::string rowDump; for( keyset::iterator key = keys.begin(); key != keys.end(); ++key ) { if( key != keys.begin() ) rowDump.append("\t"); if( nodes[*key].size() > row ) rowDump.append(std::formatstr("%f", nodes[*key][row])); } rowDump.append("\r\n"); lines.push_back(rowDump); } // dump to the clipboard std::string dump; for( strings::iterator s = lines.begin(); s != lines.end(); ++s ) { dump.append(*s); } OpenClipboard(0); EmptyClipboard(); HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1); if( hg != 0 ) { char* buf = (char*)GlobalLock(hg); if( buf != 0 ) { std::copy(dump.begin(), dump.end(), buf); buf[dump.length()] = 0; GlobalUnlock(hg); SetClipboardData(CF_TEXT, hg); } } CloseClipboard(); } return ret; } static void Reset() { TimeMap().clear(); } static std::string format_elapsed(double d) { if( d < 0.00000001 ) { // show in ps with 4 digits return std::formatstr("%0.4f ps", d * 1000000000000.0); } if( d < 0.00001 ) { // show in ns return std::formatstr("%0.0f ns", d * 1000000000.0); } if( d < 0.001 ) { // show in us return std::formatstr("%0.0f us", d * 1000000.0); } if( d < 0.1 ) { // show in ms return std::formatstr("%0.0f ms", d * 1000.0); } if( d <= 60.0 ) { // show in seconds return std::formatstr("%0.2fs", d); } if( d < 3600.0 ) { // show in min:sec return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0)); } // show in h:min:sec return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0)); } private: static __int64 TimerFreq() { static __int64 freq = 0; static bool init = false; if( !init ) { LARGE_INTEGER li; QueryPerformanceFrequency(&li); freq = li.QuadPart; init = true; } return freq; } LARGE_INTEGER startTime, stopTime; std::string label; typedef std::string key_type; typedef double time_type; typedef std::multimap<key_type, time_type> Times; // static Times times; static Times& TimeMap() { static Times times_; return times_; } struct extract_key : public std::unary_function<Times::value_type, key_type> { std::string operator()(Times::value_type const & r) const { return r.first; } }; struct extract_val : public std::unary_function<Times::value_type, time_type> { time_type operator()(Times::value_type const & r) const { return r.second; } }; struct match_key : public std::unary_function<Times::value_type, bool> { match_key(key_type const & key_) : key(key_) {}; bool operator()(Times::value_type const & rhs) const { return key == rhs.first; } private: match_key& operator=(match_key&) { return * this; } const key_type key; }; struct accum_key : public std::binary_function<time_type, Times::value_type, time_type> { accum_key(key_type const & key_) : key(key_), n(0) {}; time_type operator()(time_type const & v, Times::value_type const & rhs) const { if( key == rhs.first ) { ++n; return rhs.second + v; } return v; } private: accum_key& operator=(accum_key&) { return * this; } const Times::key_type key; mutable size_t n; }; }; 

file stringext.h (provides the formatstr () function):

 namespace std { /* --- Formatted Print template<class C> int strprintf(basic_string<C>* pString, const C* pFmt, ...); template<class C> int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args); Returns : # characters printed to output Effects : Writes formatted data to a string. strprintf() works exactly the same as sprintf(); see your documentation for sprintf() for details of peration. vstrprintf() also works the same as sprintf(), but instead of accepting a variable paramater list it accepts a va_list argument. Requires : pString is a pointer to a basic_string<> --- */ template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr); template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr) { # ifdef SECURE_VSPRINTF return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr); # else return _vsnprintf(buffer, bufferSize-1, format, argptr); # endif } template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr) { # ifdef SECURE_VSPRINTF return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr); # else return _vsnwprintf(buffer, bufferSize-1, format, argptr); # endif } template<class Type, class Traits> inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args) { // prologue static const size_t ChunkSize = 1024; size_t curBufSize = 0; outStr.erase(); if( !format ) { return 0; } // keep trying to write the string to an ever-increasing buffer until // either we get the string written or we run out of memory while( bool cont = true ) { // allocate a local buffer curBufSize += ChunkSize; std::ref_ptr<Type> localBuffer = new Type[curBufSize]; if( localBuffer.get() == 0 ) { // we ran out of memory -- nice goin'! return -1; } // format output to local buffer int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args); if( -1 == i ) { // the buffer wasn't big enough -- try again continue; } else if( i < 0 ) { // something wierd happened -- bail return i; } // if we get to this point the string was written completely -- stop looping outStr.assign(localBuffer.get(),i); return i; } // unreachable code return -1; }; // provided for backward-compatibility template<class Type, class Traits> inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args) { return vstringprintf(*outStr, format, args); } template<class Char, class Traits> inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...) { va_list args; va_start(args, format); int retval = vstringprintf(outString, format, args); va_end(args); return retval; } // old function provided for backward-compatibility template<class Char, class Traits> inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...) { va_list args; va_start(args, format); int retval = vstringprintf(*outString, format, args); va_end(args); return retval; } /* --- Inline Formatted Print string strprintf(const char* Format, ...); Returns : Formatted string Effects : Writes formatted data to a string. formatstr() works the same as sprintf(); see your documentation for sprintf() for details of operation. --- */ template<class Char> inline std::basic_string<Char> formatstr(const Char * format, ...) { std::string outString; va_list args; va_start(args, format); vstringprintf(outString, format, args); va_end(args); return outString; } }; 

The file algorithmext.h (provides the transform_if () function):

 /* --- Transform 25.2.3 template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred) template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred) Requires: T is of type EqualityComparable (20.1.1) op and binary_op have no side effects Effects : Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of: 1: op( *(first1 + (i - result)) 2: binary_op( *(first1 + (i - result), *(first2 + (i - result)) Returns : result + (last1 - first1) Complexity : At most last1 - first1 applications of op or binary_op --- */ template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryFunction f, Predicate pred) { for (; first != last; ++first) { if( pred(*first) ) *result++ = f(*first); } return result; } template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate> OutputIterator transform_if(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator result, BinaryOperation binary_op, Predicate pred) { for (; first1 != last1 ; ++first1, ++first2) { if( pred(*first1) ) *result++ = binary_op(*first1,*first2); } return result; } 
+2
Oct 23 '08 at 21:34
source share

The article code profiler and optimizer contains a lot of information about C ++ code profiling, and also has a free program / class download link that will show you a graphical presentation for different code paths / methods.

0
Sep 14 '08 at 11:27
source share

I have a quick and dirty profiling class that can be used for profiling even in the most dense inner loops. The emphasis is on extremely light weight and simple code. The class allocates a two-dimensional array of a fixed size. Then I add the checkpoint calls throughout. When control point N is reached immediately after control point M, I add the elapsed time (in microseconds) to the array element [M, N]. Since this is intended for profiling tight loops, I also have an “iteration start” that resets the “last breakpoint” variable. At the end of the test, calling dumpResults() creates a list of all the pairs of control points that followed each other, together with the total accounting time and unaccounted for.

0
Oct 23 '08 at 21:27
source share

For this reason, I wrote a simple cross-platform class nanotimer . The goal was to be as light as possible so as not to interfere with the actual performance of the code, adding too many instructions and thereby affecting the command cache. It is capable of obtaining precision in microseconds between windows, mac and linux (and probably some unix variants).

Main use:

 plf::timer t; timer.start(); // stuff double elapsed = t.get_elapsed_ns(); // Get nanoseconds 

start () also restarts the timer when necessary. The “pause" of the timer can be achieved by saving the elapsed time, then restarting the timer when "pausing" and adding to the saved result the next time the elapsed time is checked.

0
Jun 12 '17 at 0:29
source share



All Articles