Amazing test result

After watching Titus Winters "Live at Head", tell me where he mentions that StrCat () is one of the user's favorite features, I decided to try and implement something similar to see if I can win std :: string :: append (or operator +, which, as I understand it, uses append internally) in terms of runtime performance. My reasoning was that the strcat () function, implemented as a variational template, will be able to determine the combined size of all its string arguments and make one distribution to store the final result, instead of constantly redistributing in the case of operator +, which does not know about general context in which he called.

However, when I compared my custom implementation with the + operator on quick-bench , I found that my strcat () implementation was about 4 times slower than the + operator in recent versions of both clang and gcc compiled with -std=c++17 -O3. I have provided the code below for a quick scanner.

Does anyone know what could slow down here?

#include <cstring>
#include <iostream>
#include <string>

// Get the size of string-like args
int getsize(const std::string& s) { return s.size(); }
int getsize(const char* s) { return strlen(s); }
template <typename S>
int strcat_size(const S& s) {
  return getsize(s);
}
template <typename S, typename... Strings>
int strcat_size(const S& first, Strings... rest) {
  if (sizeof...(Strings) == 0) {
    return 0;
  } else {
    return getsize(first) + strcat_size(rest...);
  }
}

// Populate a pre-allocated string with content from another string-like object
template <typename S>
void strcat_fill(std::string& res, const S& first) {
  res += first;
}
template <typename S, typename... Strings>
void strcat_fill(std::string& res, const S& first, Strings... rest) {
  res += first;
  strcat_fill(res, rest...);
}

template <typename S, typename... Strings>
std::string strcat(const S& first, Strings... rest) {
  int totalsize = strcat_size(first, rest...);

  std::string res;
  res.reserve(totalsize);

  strcat_fill(res, first, rest...);

  return res;
}

const char* s1 = "Hello World! ";
std::string s2 = "Here is a string to concatenate. ";
std::string s3 = "Here is a longer string to concatenate that avoids small string optimization";
const char* s4 = "How about some more strings? ";
std::string s5 = "And more strings? ";
std::string s6 = "And even more strings to use!";

static void strcat_bench(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::string s = strcat(s1, s2, s3, s4, s5, s6);

    benchmark::DoNotOptimize(s);
  }
}
BENCHMARK(strcat_bench);

static void append_bench(benchmark::State& state) {
  for (auto _ : state) {
    std::string s = s1 + s2 + s3 + s4 + s5 + s6;

    benchmark::DoNotOptimize(s);
  }
}
BENCHMARK(append_bench);
+4
source share
1 answer

This is because passing arguments by value.

I changed the code to use folded expressions instead (which looks much cleaner)
and got rid of unnecessary copies ( Strings... restshould have been a link).

int getsize(const std::string& s) { return s.size(); }
int getsize(const char* s) { return strlen(s); }

template <typename ...P>
std::string strcat(const P &... params)
{
  std::string res;
  res.reserve((getsize(params) + ...));
  (res += ... += params);
  return res;
}

This solution is appendapproximately 30% superior .


, const . , std::string += , rvalues.


-, - , " " ( ​​ ).

template <typename ...P>
std::string strcat(const P &... params)
{
  using dummy_array = int[]; // This is necessary because `int[]{blah}` doesn't compile.
  std::string res;
  std::size_t size = 0;
  dummy_array{(void(size += getsize(params)), 0)..., 0};
  res.reserve(size);
  dummy_array{(void(res += params), 0)..., 0};
  return res;
}
+5

Source: https://habr.com/ru/post/1692318/


All Articles