Std :: string or std :: vector <char> for storing raw data
I hope this question is suitable for stackoverflow ... What is the difference between storing raw data bytes (8 bits) in std::string
and not storing them in std::vector<char>
. I am reading binary data from a file and storing these raw bytes in std::string
. It works well, no problem or problem with this. My program works as expected. However, other programmers prefer the std::vector<char>
approach and suggest abandoning the use of std::string
, since it is unsafe for raw bytes. So I wonder why it is not safe to use std::string
to store raw data bytes? I know that std::string
most often used to store ASCII text, but the byte is a byte, so I don't understand the preference of std::vector<char>
.
Thanks for any advice!
The problem is not whether it works or not. The problem is that this is completely confusing for the next guy reading your code. std::string
is for displaying text. Anyone reading your code will expect this. You will be better at declaring your intent with std::vector<char>
.
It increases your WTF / min in code reviews.
In C ++ 03, using std::string
to store an array of byte data was not a good idea. According to the std::string
standard, it was not necessary to store data contiguously. C ++ 11 fixed that data should be contiguous.
Therefore, in C ++ 03 this would not be functional. If you have not personally checked the standard C ++ library implementation of std::string
to make sure it is contiguous.
In any case, I would suggest vector<char>
. Usually, when you see a string
, you expect it to be ... a string. You know, a sequence of characters in some form of encoding. A vector<char>
makes it obvious that this is not a string, but an array of bytes.
In addition to related storage issues and code clarity, I ran into some pretty insidious errors trying to use std::string
to store raw bytes.
Most of them focused on trying to convert a char
array from bytes to std::string
when interacting with C libraries. For example:
std::string password = "pass\0word"; std::cout << password.length() << std::endl; // prints 4, not 9
Perhaps you can fix this by specifying the length:
std::string password("pass\0word", 0, 9); std::cout << password.length() << std::endl; // nope! still 4!
This is probably due to the fact that the constructor expects to get a C-string, not an array of bytes. Maybe the best way, but I ended up with this:
std::string password("pass0word", 0, 9); password[4] = '\0'; std::cout << password.length() << std::endl; // hurray! 9!
A bit awkward. Fortunately, I found this in unit testing, but I would have missed it if my test vectors didn't have zero bytes. What makes this insidious is that the second approach above will work fine until the array contains zero bytes.
So far, std::vector<uint8_t>
looks pretty good (thanks to JN and Hurkyl):
char p[] = "pass\0word"; std::vector<uint8_t> password(p, p, p+9); // :)
Note. I have not tried the iterator constructor with std::string
, but this error is simple enough to do this, it might be worth avoiding even the possibility.
Lessons learned:
- Testing witih byte processing methods with null byte test vectors.
- Be careful when (and, I would say, avoiding), using
std::string
to store raw bytes.