The most optimized string concatenation method

Question

The most optimized string concatenation method

We have always faced many situations on a daily basis in which we need to perform tedious and a lot of chain operations in our code. We all know that string manipulations are expensive operations. I would like to know which one is the cheapest among the available versions.

The most common operations are concatenation (this is something we can control to some extent). What is the best way to combine std :: strings in C ++ and various workarounds to speed up concatenation?

I mean,

std::string l_czTempStr; 1).l_czTempStr = "Test data1" + "Test data2" + "Test data3"; 2). l_czTempStr = "Test data1"; l_czTempStr += "Test data2"; l_czTempStr += "Test data3"; 3). using << operator 4). using append()

Also, do we get the advantage of using CString over std :: string?

+63

c ++ string concatenation

Bhupesh Pant Sep 19 '13 at 10:31 on

source share

5 answers

In addition to the other answers ...

I did extensive tests on this problem a while ago and came to the conclusion that the most effective solution (GCC 4.7 and 4.8 on Linux x86 / x64 / ARM) in all cases of use is primarily reserve() , a result string with enough storage space all concatenated strings, and then only append() them (or use operator +=() , which does not matter).

Unfortunately, it seems to me that I deleted this test, so you only have my word (but you can easily adapt the Mats Petersson test to see for yourself if my word is not enough).

In a nutshell:

 const string space = " "; string result; result.reserve(5 + space.size() + 5); result += "hello"; result += space; result += "world";

Depending on the specific use case (number, types and sizes of concatenated strings), sometimes this method is the most effective, and in other cases it is comparable to other methods, but it is never worse.

The problem is that it is actually very difficult to calculate the total required size in advance, especially when mixing string literals and std::string (as the example above shows). The maintainability of such code is absolutely terrible as soon as you modify one of the literals or add another line to concatenate.

One approach would be to use sizeof to calculate the size of literals, but IMHO creates as much mess as it solves, maintainability is still terrible:

 #define STR_HELLO "hello" #define STR_WORLD "world" const string space = " "; string result; result.reserve(sizeof(STR_HELLO)-1 + space.size() + sizeof(STR_WORLD)-1); result += STR_HELLO; result += space; result += STR_WORLD;

Useful solution (C ++ 11, variable templates)

Finally, I decided to create a set of variable templates that efficiently deal with calculating the size of strings (for example, the size of string literals is determined at compile time), reserve() as needed, and then combines everything.

Here, hopefully this is useful:

 namespace detail { template<typename> struct string_size_impl; template<size_t N> struct string_size_impl<const char[N]> { static constexpr size_t size(const char (&) [N]) { return N - 1; } }; template<size_t N> struct string_size_impl<char[N]> { static size_t size(char (&s) [N]) { return N ? strlen(s) : 0; } }; template<> struct string_size_impl<const char*> { static size_t size(const char* s) { return s ? strlen(s) : 0; } }; template<> struct string_size_impl<char*> { static size_t size(char* s) { return s ? strlen(s) : 0; } }; template<> struct string_size_impl<std::string> { static size_t size(const std::string& s) { return s.size(); } }; template<typename String> size_t string_size(String&& s) { using noref_t = typename std::remove_reference<String>::type; using string_t = typename std::conditional<std::is_array<noref_t>::value, noref_t, typename std::remove_cv<noref_t>::type >::type; return string_size_impl<string_t>::size(s); } template<typename...> struct concatenate_impl; template<typename String> struct concatenate_impl<String> { static size_t size(String&& s) { return string_size(s); } static void concatenate(std::string& result, String&& s) { result += s; } }; template<typename String, typename... Rest> struct concatenate_impl<String, Rest...> { static size_t size(String&& s, Rest&&... rest) { return string_size(s) + concatenate_impl<Rest...>::size(std::forward<Rest>(rest)...); } static void concatenate(std::string& result, String&& s, Rest&&... rest) { result += s; concatenate_impl<Rest...>::concatenate(result, std::forward<Rest>(rest)...); } }; } // namespace detail template<typename... Strings> std::string concatenate(Strings&&... strings) { std::string result; result.reserve(detail::concatenate_impl<Strings...>::size(std::forward<Strings>(strings)...)); detail::concatenate_impl<Strings...>::concatenate(result, std::forward<Strings>(strings)...); return result; }

The only interesting part about the public interface is the last template<typename... Strings> std::string concatenate(Strings&&... strings) template. The use is simple:

 int main() { const string space = " "; std::string result = concatenate("hello", space, "world"); std::cout << result << std::endl; }

When enabling optimizations, any decent compiler should be able to deploy the concatenate call to the same code as my first example, where I manually wrote everything. As for GCC 4.7 and 4.8, the generated code is pretty much identical, as well as performance.

+32

syam Sep 19 '13 at 15:34

source share

A possible WORST script uses the plain old strcat (or sprintf ), since strcat takes a C string and needs to be "counted" to find the end. For long lines, this is a real person. C ++ style strings are much better, and performance issues are more likely to be related to memory allocation rather than length counting. But then again, the row grows geometrically (doubles every time it should grow), so it’s not so scary.

I would very much suspect that all of the above methods end with the same, or at least very similar characteristics. Anyway, I would expect stringstream be slower due to the overhead of formatting support, but I also suspect it is marginal.

Be that as it may, “fun,” I will return with the benchmark ...

Edit:

Please note that these results apply to my machine running x86-64 Linux compiled with g ++ 4.6.3. Other versions of the OS, compilers, and C ++ runtime library implementations may vary. If performance is important to your application, then compare the system (s) that are critical to you using the compilers you use.

Here is the code I wrote to test this. This may not be an ideal idea of a real scenario, but I consider this a typical scenario:

 #include <iostream> #include <iomanip> #include <string> #include <sstream> #include <cstring> using namespace std; static __inline__ unsigned long long rdtsc(void) { unsigned hi, lo; __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 ); } string build_string_1(const string &a, const string &b, const string &c) { string out = a + b + c; return out; } string build_string_1a(const string &a, const string &b, const string &c) { string out; out.resize(a.length()*3); out = a + b + c; return out; } string build_string_2(const string &a, const string &b, const string &c) { string out = a; out += b; out += c; return out; } string build_string_3(const string &a, const string &b, const string &c) { string out; out = a; out.append(b); out.append(c); return out; } string build_string_4(const string &a, const string &b, const string &c) { stringstream ss; ss << a << b << c; return ss.str(); } char *build_string_5(const char *a, const char *b, const char *c) { char* out = new char[strlen(a) * 3+1]; strcpy(out, a); strcat(out, b); strcat(out, c); return out; } template<typename T> size_t len(T s) { return s.length(); } template<> size_t len(char *s) { return strlen(s); } template<> size_t len(const char *s) { return strlen(s); } void result(const char *name, unsigned long long t, const string& out) { cout << left << setw(22) << name << " time:" << right << setw(10) << t; cout << " (per character: " << fixed << right << setw(8) << setprecision(2) << (double)t / len(out) << ")" << endl; } template<typename T> void benchmark(const char name[], T (Func)(const T& a, const T& b, const T& c), const char *strings[]) { unsigned long long t; const T s1 = strings[0]; const T s2 = strings[1]; const T s3 = strings[2]; t = rdtsc(); T out = Func(s1, s2, s3); t = rdtsc() - t; if (len(out) != len(s1) + len(s2) + len(s3)) { cout << "Error: out is different length from inputs" << endl; cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`"; } result(name, t, out); } void benchmark(const char name[], char* (Func)(const char* a, const char* b, const char* c), const char *strings[]) { unsigned long long t; const char* s1 = strings[0]; const char* s2 = strings[1]; const char* s3 = strings[2]; t = rdtsc(); char *out = Func(s1, s2, s3); t = rdtsc() - t; if (len(out) != len(s1) + len(s2) + len(s3)) { cout << "Error: out is different length from inputs" << endl; cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`"; } result(name, t, out); delete [] out; } #define BM(func, size) benchmark(#func " " #size, func, strings ## _ ## size) #define BM_LOT(size) BM(build_string_1, size); \ BM(build_string_1a, size); \ BM(build_string_2, size); \ BM(build_string_3, size); \ BM(build_string_4, size); \ BM(build_string_5, size); int main() { const char *strings_small[] = { "Abc", "Def", "Ghi" }; const char *strings_medium[] = { "abcdefghijklmnopqrstuvwxyz", "defghijklmnopqrstuvwxyzabc", "ghijklmnopqrstuvwxyzabcdef" }; const char *strings_large[] = { "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz" "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz", "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc", "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef" }; for(int i = 0; i < 5; i++) { BM_LOT(small); BM_LOT(medium); BM_LOT(large); cout << "---------------------------------------------" << endl; } }

Here are some representative results:

 build_string_1 small time: 4075 (per character: 452.78) build_string_1a small time: 5384 (per character: 598.22) build_string_2 small time: 2669 (per character: 296.56) build_string_3 small time: 2427 (per character: 269.67) build_string_4 small time: 19380 (per character: 2153.33) build_string_5 small time: 6299 (per character: 699.89) build_string_1 medium time: 3983 (per character: 51.06) build_string_1a medium time: 6970 (per character: 89.36) build_string_2 medium time: 4072 (per character: 52.21) build_string_3 medium time: 4000 (per character: 51.28) build_string_4 medium time: 19614 (per character: 251.46) build_string_5 medium time: 6304 (per character: 80.82) build_string_1 large time: 8491 (per character: 3.63) build_string_1a large time: 9563 (per character: 4.09) build_string_2 large time: 6154 (per character: 2.63) build_string_3 large time: 5992 (per character: 2.56) build_string_4 large time: 32450 (per character: 13.87) build_string_5 large time: 15768 (per character: 6.74)

The same code executed as 32-bit:

 build_string_1 small time: 4289 (per character: 476.56) build_string_1a small time: 5967 (per character: 663.00) build_string_2 small time: 3329 (per character: 369.89) build_string_3 small time: 3047 (per character: 338.56) build_string_4 small time: 22018 (per character: 2446.44) build_string_5 small time: 3026 (per character: 336.22) build_string_1 medium time: 4089 (per character: 52.42) build_string_1a medium time: 8075 (per character: 103.53) build_string_2 medium time: 4569 (per character: 58.58) build_string_3 medium time: 4326 (per character: 55.46) build_string_4 medium time: 22751 (per character: 291.68) build_string_5 medium time: 2252 (per character: 28.87) build_string_1 large time: 8695 (per character: 3.72) build_string_1a large time: 12818 (per character: 5.48) build_string_2 large time: 8202 (per character: 3.51) build_string_3 large time: 8351 (per character: 3.57) build_string_4 large time: 38250 (per character: 16.35) build_string_5 large time: 8143 (per character: 3.48)

From this we can conclude:

The best option is to add bits at a time ( out.append() or out += ), while the “tied” approach is pretty close.
Pre-highlighting a line is not useful.
Using stringstream is a pretty bad idea (between 2-4x slower).
char * uses new char[] . Using a local variable in the calling function makes it the fastest, but slightly unfair compares it.
The short string concatenation has a fair bit of overhead - a simple copy of the data should be no more than one cycle for each byte (if the data does not fit in the cache).

edit2

Added according to comments:

 string build_string_1b(const string &a, const string &b, const string &c) { return a + b + c; }

and

 string build_string_2a(const string &a, const string &b, const string &c) { string out; out.reserve(a.length() * 3); out += a; out += b; out += c; return out; }

What gives these results:

 build_string_1 small time: 3845 (per character: 427.22) build_string_1b small time: 3165 (per character: 351.67) build_string_2 small time: 3176 (per character: 352.89) build_string_2a small time: 1904 (per character: 211.56) build_string_1 large time: 9056 (per character: 3.87) build_string_1b large time: 6414 (per character: 2.74) build_string_2 large time: 6417 (per character: 2.74) build_string_2a large time: 4179 (per character: 1.79)

(32-bit startup, but 64-bit shows very similar results).

+18

Mats Petersson Sep 19 '13 at 10:44 on

source share

As in the case of most microoptimizations, you will need to measure the effect of each parameter by first setting it by measuring that this is really an optimization of the bottle neck. There is no final answer.

append and += should do the same.

+ conceptually less effective as you create and destroy temporary ones. Your compiler may or may not optimize it as fast as adding.

Calling reserve with a common size can reduce the number of memory allocations needed - they are likely to be the biggest bottleneck.

<< (presumably on stringstream ) may or may not be faster; you will need to measure it. This is useful if you need to format non-string types, but it probably won't be especially better or worse when dealing with strings.

CString has the disadvantage that it is not portable, and that a Unix hacker like me cannot tell you what its advantages may or may not be.

+8

Mike Seymour Sep 19 '13 at 10:45

source share

There are several important parameters that can influence the decision of the “most optimized path”. Some of them: line / content size, number of operations, compiler optimization, etc.

In most cases, string::operator+= works best. However, it is sometimes observed on some compilers that ostringstream::operator<< works best [for example, MingW g ++ 3.2.3, 1.8 GHz, one Dell PC processor]. When a compiler context arises, it is mainly an optimization in the compiler that will affect. In addition, stringstreams are complex objects compared to simple strings and, therefore, are added to the overhead.

For more information - discussion , article .

0

parasrish May 19 '17 at 3:41 a.m.

source share

Jesse Good · Accepted Answer · 2013-09-19 10:34

Here is a small set of tests:

 #include <iostream> #include <string> #include <chrono> #include <sstream> int main () { typedef std::chrono::high_resolution_clock clock; typedef std::chrono::duration<float, std::milli> mil; std::string l_czTempStr; std::string s1="Test data1"; auto t0 = clock::now(); #if VER==1 for (int i = 0; i < 100000; ++i) { l_czTempStr = s1 + "Test data2" + "Test data3"; } #elif VER==2 for (int i = 0; i < 100000; ++i) { l_czTempStr = "Test data1"; l_czTempStr += "Test data2"; l_czTempStr += "Test data3"; } #elif VER==3 for (int i = 0; i < 100000; ++i) { l_czTempStr = "Test data1"; l_czTempStr.append("Test data2"); l_czTempStr.append("Test data3"); } #elif VER==4 for (int i = 0; i < 100000; ++i) { std::ostringstream oss; oss << "Test data1"; oss << "Test data2"; oss << "Test data3"; l_czTempStr = oss.str(); } #endif auto t1 = clock::now(); std::cout << l_czTempStr << '\n'; std::cout << mil(t1-t0).count() << "ms\n"; }

On coliru :

Compile the following:

clang ++ -std = C ++ 11 -O3 -DVER = 1 -Wall -pedantic -pthread main.cpp

21.6463ms

-DVER = 2

6.61773ms

-DVER = 3

6.7855ms

-DVER = 4

102.015ms

It seems that 2) , += is the winner.

(Also compiling with and without -pthread affects timings)

The most optimized string concatenation method

Useful solution (C ++ 11, variable templates)

More articles: