Is it safe to serialize POD data by directly converting to a char array?

Suppose T is a POD type that does not contain a pointer, and I want to serialize T (in addition to some other data, too). To do this, I created the following functions:

 template<class T> void serialize(const T& source, char*& dest) { *(T*)dest = source; dest += sizeof(T); } template<class T> void deserialize(T& dest, char*& source) { dest = *(T*)source; source += sizeof(T); } 

Will it cause any problems or any compilers if this does not work? In other words, will the code be:

 template<class T> bool check_sanity(const T& obj) { std::unique_ptr<char[]> buffer { new int[sizeof(T)] }; serialize(obj, buffer); T new_obj; deserialize(new_obj, buffer); return new_obj == obj; } 

Always return false? (Suppose T is a POD and no one overloads the == operator).

I am writing these serialization methods for use in conjunction with MPI, where they will be used at the beginning of the program to distribute some of the data needed for accounting, so the same program will always serialize and deserialize the data,

+5
source share
3 answers

I see a couple of problems. Disadvantage:

 *(T*)dest = source; 

IIRC, this is UB due to a violation of alias rules ( char * can use any other pointer, but that means you can access some object using the char * pointer, but not vice versa, as in your example).

In other words, will the code be: ... Always return false?

Maybe not, but you mentioned serializing more than just one object.

So the main problem is alignment :

 std::unique_ptr<char[]> buffer { new char[sizeof(int) + 1] }; char x = 0; int y = 0; serialize(x, buffer); serialize(y, buffer); // may crash or write into wrong location 

The invalid string is the same (but also deserialize ):

 *(T*)dest = source; // source is int, dest is not aligned 

The compiler will assume that dest is correctly aligned and uses CPU instructions for aligned storage (on ARM architectures, this will cause real problems).

The solution is to use memcpy instead:

 memcpy(dest, &source, sizeof(T)); 

No need to worry about performance. Modern compilers can very well optimize memcpy objects with known sizes.

+2
source

*(T*)dest = source; - severe violation of pseudonyms.

Instead, you should write:

 memcpy(dest, &source, sizeof source); 

You can successfully copy POD objects using memcpy .

In your check_sanity function check_sanity it does not compile, because operator== not defined for T (There are no implicitly generated comparison operators)

+1
source

Yes, you can do this as long as the buffer is an array of char, unsigned char or std :: byte, C ++ standard [basic.type]:

For any object (except the subobject of the base class) of the trivially copied type T, regardless of whether the object has the correct value of type T, the base bytes (4.4) that make up the object can be copied to an array from char, unsigned char, orstd :: byte (21.2.1). If the contents of this array are copied back to the object, the object subsequently retains its original value. [Example:

 #define N sizeof(T) char buf[N]; T obj; //obj initialized to its original value std::memcpy(buf, &obj, N);// between these two calls to std::memcpy,obj might be modified std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type holds its original value 

- end of example]

Note: buffer alignment is not necessary.

0
source

Source: https://habr.com/ru/post/1272425/


All Articles