The “Right” Way to Store Binary Data Using C ++ / STL

In general, what is the best way to store binary data in C ++? The parameters, as far as I can tell, come down to a large extent to the use of <char> s strings or vectors. (I omit the possibility of char * s and malloc (), since I mean specifically C ++).

I usually just use a string, however, I'm not sure if there is overhead that I lost or conversions that STL does internally, which can ruin the reasonableness of the binary data. Does anyone have pointers (har) about this? Suggestions or preferences anyway?

+44
c ++ binary-data stl
Jan 13 '09 at 22:58
source share
4 answers

char vector is good because memory is contiguous. Therefore, you can use it with a large number of C APIs such as berkley sockets or file APIs. You can do the following, for example:

std::vector<char> vect; ... send(sock, &vect[0], vect.size()); 

and it will work fine.

You can essentially treat it the same as any other dynamically allocated char buffer. You can scan up and down in search of magic numbers or patterns. You can disassemble it partially in place. To get from a socket, you can easily resize it to add more data.

The disadvantage is resizing is not very efficient (resizing or prejudice is prudent), and removal from the front of the array will also be very uncertain. If you need to, say, put only one or two characters at a time in front of the data structure very often, copying them to deque before this processing can be an option. It costs you a copy, and deque memory does not touch, so you cannot just pass a pointer to the C API.

Below, learn about data structures and their trade-offs before diving, however the char vector is usually what I see in normal practice.

+38
Jan 13 '09 at 23:02
source share

The biggest problem with std :: string is that the current standard does not guarantee that its underlying storage is contiguous. However, there are no known STL implementations where the string is not contiguous, so in practice it probably will not fail. In fact, the new C ++ 0x standard will fix this problem by indicating that std :: string uses a contiguous buffer like std :: vector.

Another argument against the string is that its name indicates that it contains a string of characters, not a binary buffer, which can cause confusion for those who read the code.

However, I also recommend a vector.

+8
Jan 14 '09 at 2:31
source share

I use std::string for this too, and I never had a problem with it.

One “pointer” that I received a sharp reminder in a piece of code yesterday: when creating a string from a binary data block, use the constructor std::string(startIter, endIter) , and not the form std::string(ptr, offset, length) latter does the assumption that the pointer points to a C-style string and ignores anything after the first null character (it copies "to" the specified characters length , not length ).

+6
Jan 14 '09 at 2:10
source share

Of course, you should use a char container, but the container you want to use depends on your application.

Chars has several properties that make them useful for storing binary data: the standard prohibits "padding" for the char data type, which is important because it means you won't get garbage in your binary layout. Each char will also be guaranteed to have exactly one byte, which makes it the only simple old data type (POD) with a fixed width (all others are specified in terms of upper and / or lower bounds).

A discussion of the appropriate stl container in which characters are stored is well described by Doug above. Which one you need depends entirely on your use case. If you simply hold the block of data that you iterate without any special search, add / remove or the need for splicing, I would prefer a vector that will make your intentions clearer than std :: string, which many libraries and functions will consider executed string c-style with zero completion.

+3
Jan 14 '09 at 2:18
source share



All Articles