Changing a vector to an array makes my program slower

I profiled my program and found out that the hottest point was levenshtein_distance , called recursively. I decided to try and optimize it.

 lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 ) { const size_t len1 = s1.size(), len2 = s2.size(); std::vector<unsigned int> col( len2+1 ), prevCol( len2+1 ); const size_t prevColSize = prevCol.size(); for( unsigned int i = 0; i < prevColSize; i++ ) prevCol[i] = i; for( unsigned int i = 0, j; i < len1; ++i ) { col[0] = i+1; const char s1i = s1[i]; for( j = 0; j < len2; ++j ) { const auto minPrev = 1 + std::min( col[j], prevCol[1 + j] ); col[j+1] = std::min( minPrev, prevCol[j] + ( static_cast<unsigned int>( s1i != s2[j] ) ) ); } col.swap( prevCol ); } return prevCol[len2]; } 

TL DR: I changed std::string β†’ std::array

War story: And after running vtune on it, I found that the line updating col[j+1] was the one who slowed everything down (90% of the time spent on it). I thought: β€œWell, maybe this is a smoothing problem, maybe the compiler cannot determine that the character arrays inside the string objects are unexplored, because they are masked by the string interface and spend 90% of their time checking that no other part of the program has changed them.

So, I changed my string to a static array because there is no dynamic memory there, and the next step would be to use restrict to help the compiler. But in the meantime, I decided to check if I achieved any performance by doing this.

 lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 ) { const size_t len1 = s1.size(), len2 = s2.size(); static constexpr unsigned MAX_STRING_SIZE = 512; assert(len1 < MAX_STRING_SIZE && len2 < MAX_STRING_SIZE); static std::array<unsigned int, MAX_STRING_SIZE> col, prevCol; for( unsigned int i = 0; i < len2+1; ++i ) prevCol[i] = i; // the rest is unchanged } 

TL DR : now it is slow.

It so happened that I lost performance. A lot of. Instead of working after ~ 6 seconds, my sample program now works after 44 seconds. Reusing vtune for a profile shows that the function is being called again and again: std::swap (for you, gcc, this is the /move.h bit), which in turn is called std::swap_ranges (bits / stl_algobase. H) .

I assume that std::min is implemented using quicksort , which explains why swapping occurs, but I don’t understand why the exchange in this case takes so long.

EDIT : Compiler Options: I am using gcc with the parameters "-O2 -g -DNDEBUG" and a set of warning specifications.

+4
source share
2 answers

In the experiment, I launched a version of your source code, largely unmodified, with a couple of short lines obtained by timings of ~ 36s for the array version and ~ 8s for the vector version.

Your version seems to depend heavily on the choice of MAX_STRING_SIZE . When I used 50 instead of 512 (which just set my lines), the time for the array version was reduced to about 16 s.

Then I performed this manual translation of the main loop to get rid of the explicit exchange. This further reduced the version time of the array to 11 seconds and, more interestingly, now changed the version time of the array regardless of the choice of MAX_STRING_SIZE . When it returns to 512, the array version still takes 11 seconds.

This is good evidence that explicit swap arrays are a major issue with the performance of your version.

There is still a significant difference between an array and a vector version with an array version speaking 40% longer. I did not have the opportunity to find out exactly why this could be.

 for( unsigned int i = 0, j; i < len1; ++i ) { { col[0] = i+1; const char s1i = s1[i]; for( j = 0; j < len2; ++j ) { const auto minPrev = 1 + std::min( col[j], prevCol[1 + j] ); col[j+1] = std::min( minPrev, prevCol[j] + ( static_cast<unsigned int>( s1i != s2[j] ) ) ); } } if (!(++i < len1)) return col[len2]; { prevCol[0] = i+1; const char s1i = s1[i]; for( j = 0; j < len2; ++j ) { const auto minPrev = 1 + std::min( prevCol[j], col[1 + j] ); prevCol[j+1] = std::min( minPrev, col[j] + ( static_cast<unsigned int>( s1i != s2[j] ) ) ); } } } return prevCol[len2]; 
+4
source

Firstly: @DanielFischer in all likelihood has already pointed out what caused your performance to degrade: Swapping std::arrays is a linear time operation, and swapping std::vector is a constant time operation. Although some compilers may optimize this, your gcc does not seem to be able to do this.

Also important: using a static array, as you did here, makes your code essentially thread-safe. This is usually not a good idea.

Removing one of the arrays (or vectors) and the exchange associated with it and using a dynamically allocated c-array is actually quite simple and provides excellent performance (at least for my setup). A few more conversions (for example, sequentially using size_t ) lead to the following function:

 unsigned int levenshtein_distance3( const std::string & s1, const std::string & s2 ) { const size_t len1 = s1.size(), len2 = s2.size(); ::std::unique_ptr<size_t[]> col(new size_t[len2 + 1]); for(size_t i = 0; i < len2+1; ++i ) col[i] = i; for(size_t i = 0; i < len1; ++i ) { size_t lastc = col[0]; col[0] = i+1; const char s1i = s1[i]; for(size_t j = 0; j < len2; ++j ) { const auto minPrev = 1 + (::std::min)(col[j], col[j + 1]); const auto newc = (::std::min)(minPrev, lastc + (s1i != s2[j] ? 1 : 0)); lastc = col[j+1]; col[j + 1] = newc; } } return col[len2]; } 
+1
source

Source: https://habr.com/ru/post/1480912/


All Articles