How efficient is std :: string compared to null characters?

Question

How efficient is std :: string compared to null characters?

I found that std::string very slow compared to old lines with zero completion, so slow that they significantly slow down my overall program by 2 times.

I expected the STL to be slower, I did not understand that it would be much slower.

I am using Visual Studio 2008 release mode. It shows the string assignment 100-1000 times slower than the char* assignment (it is very difficult to check the execution time of the char* job). I know that this is not a fair comparison, pointer assignment and string copy, but my program has many string assignments, and I'm not sure I can use the "const reference" trick in all places. With a reference counting implementation, my program would be great, but these implementations seem to no longer exist.

My real question is: why do people no longer use reference counting implementations, and does this mean that we all have to be much more careful to avoid common std :: string performance errors?

My complete code is below.

 #include <string> #include <iostream> #include <time.h> using std::cout; void stop() { } int main(int argc, char* argv[]) { #define LIMIT 100000000 clock_t start; std::string foo1 = "Hello there buddy"; std::string foo2 = "Hello there buddy, yeah you too"; std::string f; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); f = foo1; foo1 = foo2; foo2 = f; } double stl = double(clock() - start) / CLOCKS\_PER\_SEC; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); } double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC; char* goo1 = "Hello there buddy"; char* goo2 = "Hello there buddy, yeah you too"; char *g; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); g = goo1; goo1 = goo2; goo2 = g; } double charLoop = double(clock() - start) / CLOCKS_PER_SEC; cout << "Empty loop = " << emptyLoop << "\n"; cout << "char* loop = " << charLoop << "\n"; cout << "std::string = " << stl << "\n"; cout << "slowdown = " << (stl - emptyLoop) / (charLoop - emptyLoop) << "\n"; std::string wait; std::cin >> wait; return 0; }

+13

c ++ performance stl

Tim Cooper Mar 12 '09 at 8:09

source share

14 answers

It looks like you are using char * incorrectly in the code you inserted. if you have

 std::string a = "this is a"; std::string b = "this is b" a = b;

copy line operation in progress. If you do the same with char *, you are doing a copy pointer operation.

The assignment operation std :: string allocates enough memory to store the contents of b in a, and then copies each character one at a time. In the case of char *, it does not allocate memory or copy individual characters one by one, but simply says: "Now it points to the same memory that b points to."

I assume that is why std :: string is slower because it actually copies the string, which is apparently what you want. To perform a copy operation on char *, you will need to use the strcpy () function to copy to a buffer that already has an appropriate size. Then you will get an exact comparison. But for the purposes of your program, you almost certainly use std :: string.

+11

Dan Olson Mar 12 '09 at 9:51

source share

When writing C ++ code using any utility class (be it STL or your own) instead, for example. good old lines with a null character C, you need to remember a few things.

If you are guided without compiler optimization (for example, inlining functions), classes will lose. They are not inline, even stl. They are implemented in terms of method calls.
Do not create unnecessary objects.
Do not copy objects if possible.
Pass objects as links, not copies, if possible,

Use more specialized methods and functions and higher level algorithms. For example:.

 std::string a = "String a" std::string b = "String b" // Use a.swap(b); // Instead of std::string tmp = a; a = b; b = tmp;

And one last point. When your C-like C ++ code begins to become more complex, you need to implement more complex data structures, such as automatically expanding arrays, dictionaries, efficient priority queues. And suddenly you realize that his great work and your classes are not much faster than ordinary ones. Just more mistakes.

+7

Tomek Szpakowicz Mar 12 '09 at 8:53

source share

You are certainly doing something wrong, or at least not comparing “fairly” between the STL and your own code. Of course, it is difficult to be more specific without code to look.

You might be structuring your code with STL in such a way as to run more constructors or not use reassigned objects so that they match what you do when you perform the operations themselves, and so on.

+5

unwind Mar 12 '09 at 8:13

source share

This test tests two fundamentally different things: a shallow copy and a deep copy. It is important to understand the difference and how to avoid deep copies in C ++, since the C ++ object by default provides the semantics of values for its instances (as is the case with the usual old data types), which means that assigning one to another usually happens for copying.

I “fixed” your test and received the following:

 char* loop = 19.921 string = 0.375 slowdown = 0.0188244

Apparently, we should stop using C-style strings as they are much slower! Actually, I deliberately made my test as erroneous as yours by checking shallow copying on the side of the vs vs strcpy line on:

 #include <string> #include <iostream> #include <ctime> using namespace std; #define LIMIT 100000000 char* make_string(const char* src) { return strcpy((char*)malloc(strlen(src)+1), src); } int main(int argc, char* argv[]) { clock_t start; string foo1 = "Hello there buddy"; string foo2 = "Hello there buddy, yeah you too"; start = clock(); for (int i=0; i < LIMIT; i++) foo1.swap(foo2); double stl = double(clock() - start) / CLOCKS_PER_SEC; char* goo1 = make_string("Hello there buddy"); char* goo2 = make_string("Hello there buddy, yeah you too"); char *g; start = clock(); for (int i=0; i < LIMIT; i++) { g = make_string(goo1); free(goo1); goo1 = make_string(goo2); free(goo2); goo2 = g; } double charLoop = double(clock() - start) / CLOCKS_PER_SEC; cout << "char* loop = " << charLoop << "\n"; cout << "string = " << stl << "\n"; cout << "slowdown = " << stl / charLoop << "\n"; string wait; cin >> wait; }

The main thing, and it actually comes to the bottom of your final question, you need to know what you are doing with the code. If you use a C ++ object, you should know that assigning one to another will make a copy of this object (if the assignment is not disabled, in this case you will receive an error message). You should also know when to use a link, pointer, or smart pointer to an object, and with C ++ 11, you also need to understand the difference between the semantics of movement and copying.

My real question is: why don't people use reference counting and that means we all have to be a lot more careful to avoid common std :: string performance errors?

People really use reference counting implementations. Here is an example of one of them:

 shared_ptr<string> ref_counted = make_shared<string>("test"); shared_ptr<string> shallow_copy = ref_counted; // no deep copies, just // increase ref count

The difference is that the string does not make it internally, since it would be inefficient for those who do not need it. Things like copy-on-write usually don't work for strings for similar reasons (plus the fact that this usually makes the problem with threads). Nevertheless, we have all the building blocks right here to do copy-on-write, if we want to do this: we have the ability to swap strings without any deep copying, we can make pointers, links or smart pointers for them.

To use C ++ effectively, you must get used to this way of thinking using value semantics. If you don’t do this, you can enjoy additional security and convenience, but do it at a high cost for the efficiency of your code (unnecessary copies are certainly an important part of what makes poorly written C ++ code slower than C). After all, your original test still deals with pointers to strings, not char[] arrays. If you used arrays of characters, not pointers to them, you also need strcpy to replace them. With strings, you even have a built-in swap method to do exactly what you do in your test, so my advice is to spend a little more time learning C ++.

+5

stinky472 Feb 29 '12 at 16:52

source share

If you have a pointer to the possible size of your vector, you can prevent excessive resizing by calling reserve () before filling it.

+2

Daniel Sloof Mar 12 '09 at 8:22

source share

Basic optimization rules:

Rule 1: Do not do this.
Rule 2: (For experts only) Do not do this yet.

Are you sure you have proven that STL is actually slow, not your algorithm ?

+2

Tobias Hertkorn Mar 12 '09 at 9:08

source share

Good performance is not always easy in STL, but it is usually designed to give you strength. I found Scott Meyers' Effective STL to understand how to work effectively with STL. Read it!

As others have said, you are likely to come across frequent deep copies of a string and compare them to implementing pointer / link counting.

Typically, any class designed to suit your specific needs will beat a general class designed for the general case. But learn to make good use of the clan class and learn to ride the rules of 80:20, and you will be much more effective than someone turning everything around on their own.

One specific drawback of std::string is that it does not guarantee performance, which makes sense. As Tim Cooper mentioned, the STL does not say whether the string job creates a deep copy. This is useful for a universal class because link counting can be a real killer in highly competitive applications, although this is usually the best way for a single-threaded application.

+2

peterchen Mar 12 '09 at 9:09

source share

When used properly, std :: string is effective as char *, but with extra protection.

If you run into performance issues with STL, it is likely that you are doing something wrong.

In addition, STL implementations are not standard for compilers. I know that SGI STL and STLPort work generally well.

However, and I'm absolutely serious, you can be a C ++ gene and have developed code that is much more complex than STL. It's unlikely, but who knows, you can be LeBron James from C ++.

+1

Alan Mar 12 '09 at 8:19

source share

They were not mistaken. An STL implementation is usually better than yours.

I'm sure you can write something better for a special occasion, but factor 2 is too much ... you really have to do something wrong.

0

Ionut Anghelcovici Mar 12 '09 at 8:15

source share

I would say that an STL implementation is better than traditional implementations. You also tried to use a list instead of a vector, because the vector is effective for some purpose, and the list is effective for some other

0

Ram Mar 12 '09 at 8:28

source share

                         string const string & char * Java string
 -------------------------------------------------- -------------------------------------------------
 Efficient no ** yes yes yes
 assignment                          

 Thread-safe yes yes yes yes

 memory management yes no no yes
 done for you

** There are 2 implementations of std :: string: reference counting or deep copy. The reference counter presents performance problems in multi-threaded programs, EVEN is read-only, and a deep copy is obviously slower, as shown above. See: Why are VC ++ strings not counted by reference?

As shown in this table, a string is better than a char * in some respects and worse in others, and a const string & is similar in its char * properties. Personally, I will continue to use 'char *' in many places. The sheer volume of std :: string copying, which happens silently, with implicit copy constructors and temporary entries, makes me somewhat ambivalent about std :: string.

0

Tim Cooper Apr 13 '13 at 1:46

source share

std::string will always be slower than C-lines. C-lines are just a linear array of memory. You cannot be more efficient than simply being a data structure. The algorithms you use (e.g. strcat() or strcpy() ) are usually equivalent to STL counterparts. Calling an instance of a class and a method will be relatively much slower than operations with C-strings (even worse if the implementation uses virtual machines). The only way to get equivalent performance is to optimize the compiler.

-one

Legion Jul 09 '12 at 18:30

source share

Most of the reason may be that reference counting is no longer used in modern STL implementations.

Here is the story (someone correct me if I am mistaken): at the beginning of the implementation of STL, they used reference counting and were fast, but not thread safe - developers expected application programmers to insert their own locking mechanisms at higher levels to make them thread safe, because if the lock was performed at 2 levels, this will slow down the work by half.

However, the programmers of the world were too ignorant or lazy to push locks everywhere. For example, if a worker thread in a multi-threaded program had to read the command line parameter std :: string, then a lock would be required even to read the line, otherwise it might fail. (2 threads increase the reference counter at the same time on different CPUs (+1), but decrease it separately (-2), therefore the reference counter decreases to zero, and the memory is freed.)

Thus, the developers discarded the reference count and instead, each std :: string always had its own copy of the string. More programs worked, but they were all slower.

So, even the modest assignment of one std :: string to another (or, equivalently, passing std :: string as a function parameter), takes about 400 machine code commands instead of 2, which is required to assign char *, slowing down 200 times.

I checked the inefficiency std :: string for one major program that had a total slowdown of about 100% compared to null characters. I also tested the raw std :: string assignment using the following code, which said that the assignment of std :: string was 100-900 times slower. (I had a problem with measuring char * speed). I was also debugging in the std :: string operator = () function - I ended up on my knee deep in the stack, about 7 layers before I pressed "memcpy ()".

I am not sure there is a solution. Perhaps if you want your program to be fast, use plain old C ++, and if you are more concerned about your own performance, you should use Java.

 #define LIMIT 800000000 clock_t start; std::string foo1 = "Hello there buddy"; std::string foo2 = "Hello there buddy, yeah you too"; std::string f; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); f = foo1; foo1 = foo2; foo2 = f; } double stl = double(clock() - start) / CLOCKS_PER_SEC; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); } double emptyLoop = double(clock() - start) / CLOCKS_PER_SEC; char* goo1 = "Hello there buddy"; char* goo2 = "Hello there buddy, yeah you too"; char *g; start = clock(); for (int i=0; i < LIMIT; i++) { stop(); g = goo1; goo1 = goo2; goo2 = g; } double charLoop = double(clock() - start) / CLOCKS_PER_SEC; TfcMessage("done", 'i', "Empty loop = %1.3fs\n" "char* loop = %1.3fs\n" "std::string loop = %1.3fs\n\n" "slowdown = %f", emptyLoop, charLoop, stl, (stl - emptyLoop) / (charLoop - emptyLoop));

-5

Tim Cooper Mar 12 '09 at 8:27

source share

Johannes Schaub - litb · Accepted Answer · 2009-03-12 08:29

The problems associated with the performance of strings and other containers are well known. Most of them are related to temporary and unnecessary copies.

It is not too difficult to use it correctly, but it is also quite easy to do wrong. For example, if you see that your code takes strings by value, where you do not need a mutable parameter, you are doing it wrong:

 // you do it wrong void setMember(string a) { this->a = a; // better: swap(this->a, a); }

You'd better take this via the const link or do a swap operation inside, rather than another copy. In this case, an increase in performance for a vector or list. However, you know for sure that there are known issues. For example:

 // let add a Foo into the vector v.push_back(Foo(a, b));

We create one temporary Foo to add a new Foo to our vector. In a manual solution that Foo can create directly in a vector. And if the vector reaches the limit of capacity, it must redistribute a larger memory buffer for its elements. What does it do? It copies each item separately to its new location using its copy constructor. A manual solution can behave more intelligently if it knows the type of elements in the hand.

Another common problem is temporary. Look at this

 string a = b + c + e;

There are many timeframes created that can be avoided in a custom solution that you actually optimize for performance. Then the std::string interface was designed for write-friendly copying. However, when streams become more popular, a transparent copy on the recording lines has problems maintaining the consistency of their states. Recent implementations tend to avoid copying recording lines and instead use other tricks where necessary.

Most of these problems are resolved, however, for the next version of the Standard. For example, instead of push_back you can use emplace_back to directly create Foo in your vector

 v.emplace_back(a, b);

Instead of making copies in the concatenation above, std::string recognizes when it combines temporary ones and optimizes for these cases. Redistribution will also avoid copying, but will move elements where necessary to their new places.

For a great read, consider Move Constructors by Andrei Alexandrescu.

Sometimes, however, comparisons also tend to be unfair. Standard containers must support the functions that they must support. For example, if your container does not maintain links to a map element when adding / removing elements from your map, comparing your “quick” map with a standard map may be unfair, since a standard map should ensure that the elements remain valid. Of course, this was just an example, and there are many such cases that you should remember, saying that "my container is faster than standard!".

How efficient is std :: string compared to null characters?

More articles: