Std :: string and several concatenations

Question

Std :: string and several concatenations

Let's look at this snippet and suppose a, b, c and d are non-empty strings.

std::string a, b, c, d; d = a + b + c;

When calculating the sum of these 3 std::string instances, the standard library implementations create the first temporary std::string object, copy the concatenated buffers a and b in the internal buffer, then perform the same operations between the temporary string and c .

One programmer emphasized that instead of this behavior, operator+(std::string, std::string) can be defined to return a std::string_helper .

These objects are very important in order to delay the actual concatenation until the moment when it was abandoned in std::string . Obviously, operator+(std::string_helper, std::string) will be defined to return the same helper, which will "mean" the fact that it has additional concatenation to execute.

This behavior will save CPU costs for creating n-1 temporary objects, allocate their buffer, copy them, etc. So my question is: why doesn't it work like that? I cannot think of any shortcomings or limitations.

+4

c ++ optimization string std

qdii Mar 08 '12 at 14:59

source share

6 answers

The obvious answer: because the standard does not allow this. This affects the code by introducing an additional custom conversion in some cases: if C is a type that has a user-defined constructor that takes std::string , then it would do:

 C obj = stringA + stringB;

illegal.

+6

James kanze Mar 08 '12 at 15:31

source share

It depends.

In C ++ 03, it’s certain that there may be a slight inefficiency (comparable to Java and C #, since they, by the way, use string interning). This can be facilitated by using:

 d = std::string("") += a += b +=c;

which is really not ... idiomatic.

In C ++ 11, operator+ overloaded for rvalue references. It means that:

 d = a + b + c;

converted to:

 d.assign(std::move(operator+(a, b).append(c)));

which is (almost) as effective as you can get.

The only inefficiency remaining in C ++ 11 is that the memory is not reserved once and for all at the beginning, so there can be redistribution and copying up to 2 times (for each new line). However, since the addition is depreciated by O (1), if C is not longer than B, then in the worst case there should be one copy of the redistribution +. And, of course, we are talking a copy of POD here (so call memcpy ).

+4

Matthieu M. Mar 08 '12 at 15:21

source share

It seems to me that something like this already exists: std::stringstream .

Only you have << instead of + . Just because std::string::operator + exists, it does not make it the most efficient.

+2

Luchian grigore Mar 08 '12 at 15:02

source share

I think that if you use += , it will be a little faster:

 d += a; d += b; d += c;

It should be faster since it does not create temporary objects. Or simply,

 d.append(a).append(b).append(c); //same as above: ie using '+=' 3 times.

0

Nawaz Mar 08 '12 at 15:04

source share

The main reason why a sequence of individual concatenations + is not executed, and especially does not do this in a cycle, is that there is O ( n ² ) complexity.

A reasonable alternative to O ( n ) complexity is to use a simple string builder like

 template< class Char > class ConversionToString { public: // Visual C++ 10.0 has some DLL linking problem with other types: CPP_STATIC_ASSERT(( std::is_same< Char, char >::value || std::is_same< Char, wchar_t >::value )); typedef std::basic_string< Char > String; typedef std::basic_ostringstream< Char > OutStringStream; // Just a default implementation, not particularly efficient. template< class Type > static String from( Type const& v ) { OutStringStream stream; stream << v; return stream.str(); } static String const& from( String const& s ) { return s; } }; template< class Char, class RawChar = Char > class StringBuilder; template< class Char, class RawChar > class StringBuilder { private: typedef std::basic_string< Char > String; typedef std::basic_string< RawChar > RawString; RawString s_; template< class Type > static RawString fastStringFrom( Type const& v ) { return ConversionToString< RawChar >::from( v ); } static RawChar const* fastStringFrom( RawChar const* s ) { assert( s != 0 ); return s; } static RawChar const* fastStringFrom( Char const* s ) { assert( s != 0 ); CPP_STATIC_ASSERT( sizeof( RawChar ) == sizeof( Char ) ); return reinterpret_cast< RawChar const* >( s ); } public: enum ToString { toString }; enum ToPointer { toPointer }; String const& str() const { return reinterpret_cast< String const& >( s_ ); } operator String const& () const { return str(); } String const& operator<<( ToString ) { return str(); } RawChar const* ptr() const { return s_.c_str(); } operator RawChar const* () const { return ptr(); } RawChar const* operator<<( ToPointer ) { return ptr(); } template< class Type > StringBuilder& operator<<( Type const& v ) { s_ += fastStringFrom( v ); return *this; } }; template< class Char > class StringBuilder< Char, Char > { private: typedef std::basic_string< Char > String; String s_; template< class Type > static String fastStringFrom( Type const& v ) { return ConversionToString< Char >::from( v ); } static Char const* fastStringFrom( Char const* s ) { assert( s != 0 ); return s; } public: enum ToString { toString }; enum ToPointer { toPointer }; String const& str() const { return s_; } operator String const& () const { return str(); } String const& operator<<( ToString ) { return str(); } Char const* ptr() const { return s_.c_str(); } operator Char const* () const { return ptr(); } Char const* operator<<( ToPointer ) { return ptr(); } template< class Type > StringBuilder& operator<<( Type const& v ) { s_ += fastStringFrom( v ); return *this; } }; namespace narrow { typedef StringBuilder<char> S; } // namespace narrow namespace wide { typedef StringBuilder<wchar_t> S; } // namespace wide

Then you can write efficient and understandable things like & hellip;

 using narrow::S; std::string a = S() << "The answer is " << 6*7; foo( S() << "Hi, " << username << "!" );

0

Cheers and hth. - alf Mar 08 '12 at 15:13

source share

Mike seymour · Accepted Answer · 2012-03-08T15:26:06+0000

why doesn't it work like that?

I can only guess why it was originally designed this way. Perhaps the string library designers just didn't think about it; perhaps they thought that additional type conversion (see below) could lead to too unexpected behavior in some situations. This is one of the oldest C ++ libraries, and a lot of wisdom, which we take for granted, simply did not exist in recent decades.

As for why it hasn't changed to work like this: it could break existing code by adding additional custom type conversion. Implicit conversions can include at most one custom transform. This is stated in C ++ 11, 13.3.3.1.2 / 1:

A user transformation sequence consists of an initial standard transformation sequence, followed by a user transformation , followed by a second standard transformation sequence.

Consider the following:

 struct thingy { thingy(std::string); }; void f(thingy); f(some_string + another_string);

This code is good if the type some_string + another_string is equal to std::string . This can be implicitly converted to thingy through the conversion constructor. However, if we changed the definition of operator+ to give a different type, then it would need two conversions (from string_helper to string to thingy ), and therefore they did not compile.

So, if string building speed is important, you need to use alternative methods, such as concatenation with += . Or, according to Mattiou, don't worry about it, because C ++ 11 fixes inefficiencies differently.

Std :: string and several concatenations

More articles: