Double deserialization without object definition

Question

Double deserialization without object definition

I am trying to read a binary serialized object, I have no object definition / source for it. I took the peak to the file and saw the names of the properties, so I manually recreated the object (call it SomeDataFormat ).

I ended up with this:

 public class SomeDataFormat // 16 field { public string Name{ get; set; } public int Country{ get; set; } public string UserEmail{ get; set; } public bool IsCaptchaDisplayed{ get; set; } public bool IsForgotPasswordCaptchaDisplayed{ get; set; } public bool IsSaveChecked{ get; set; } public string SessionId{ get; set; } public int SelectedLanguage{ get; set; } public int SelectedUiCulture{ get; set; } public int SecurityImageRefId{ get; set; } public int LogOnId{ get; set; } public bool BetaLogOn{ get; set; } public int Amount{ get; set; } public int CurrencyTo{ get; set; } public int Delivery{ get; set; } public bool displaySSN{ get; set; } }

Now I can deserialize it as follows:

 BinaryFormatter formatter = new BinaryFormatter(); formatter.AssemblyFormat = FormatterAssemblyStyle.Full; // original uses this formatter.TypeFormat = FormatterTypeStyle.TypesWhenNeeded; // this reduces size FileStream readStream = new FileStream("data.dat", FileMode.Open); SomeDataFormat data = (SomeDataFormat) formatter.Deserialize(readStream);

The first suspicious thing is that in a deserialized data object, only 2 rows have a value ( SessionId and UserEmail ). Other properties are zero or just 0. This may be intended, but still I suspect something happened during deserialization.

The second suspicious thing - if I reserialize this object, I get different file sizes. Original (695 bytes). The Reserialized object is 698 bytes. So the difference is 3 bytes. I should get the same file size as the original.

Having looked at the original and the new (reserialized) file:

Originally serialized file: (increase) enter image description here Implemented file: (increase)

As you can see, after the header section, the data looks in a different order. For example, you can see that the email address and sessionID are not in the same place.

UPDATE: will warn me that the byte following "PublicKeyToken = null" is also different. (03 ↔ 05)

Q1: Why are the values in a different order in the two files?
Q2: Why are the extra 3 bytes compared with two serialized objects?
Q3: What am I missing? How can i do this?

Any help is appreciated.

Types of related issues: 1 2 3

+6

c # .net serialization deserialization binary-serialization

Dominik antal Aug 1 '13 at 14:21

source share

5 answers

Because it may be interesting for someone, I decided to make this message about . What does the binary format of serialized .NET objects look like and how can we interpret it correctly?

I based all my research on . NET Remoting Specification: Binary Format Data Structure .

Class class:

To have a working example, I created a simple class called A that contains 2 properties, one row and one integer value, they are called SomeString and SomeValue .

Class A as follows:

 [Serializable()] public class A { public string SomeString { get; set; } public int SomeValue { get; set; } }

For serialization, I used BinaryFormatter , of course:

 BinaryFormatter bf = new BinaryFormatter(); StreamWriter sw = new StreamWriter("test.txt"); bf.Serialize(sw.BaseStream, new A() { SomeString = "abc", SomeValue = 123 }); sw.Close();

As you can see, I passed a new instance of class A containing abc and 123 as values.

Examples of result data:

If we look at the serialized result in a hex editor, we get something like this:

Let's interpret the data from the example result:

According to the above specification (here is a direct link to the PDF: [MS-NRBF] .pdf ) each record within the stream is identified using RecordTypeEnumeration . Section 2.1.2.1 RecordTypeNumeration states:

This enumeration identifies the type of record. Each entry (except MemberPrimitiveUnTyped) begins with an enumeration of the entry type. The listing size is one BYTE.

SerializationHeaderRecord:

So, if we look back at the data received, we can begin to interpret the first byte:

As indicated in 2.1.2.1 RecordTypeEnumeration , a value of 0 identifies the SerializationHeaderRecord specified in 2.6.1 SerializationHeaderRecord :

The SerializationHeaderRecord entry MUST be the first entry in binary serialization. This entry has a major and minor version of the format and identifiers of the top object and headers.

It consists of:

RecordTypeEnum (1 byte)
RootId (4 bytes)
HeaderId (4 bytes)
MajorVersion (4 bytes)
MinorVersion (4 bytes)

With this knowledge, we can interpret a record containing 17 bytes:

00 represents RecordTypeEnumeration , which is SerializationHeaderRecord in our case.

01 00 00 00 presents RootId

If neither BinaryMethodCall nor BinaryMethodReturn is present in the serialization stream, the value of this field MUST contain the ObjectId of the Class, Array, or BinaryObjectString record contained in the serialization stream.

So, in our case, it should be an ObjectId with a value of 1 (because the data is serialized using little-endian), which we hope to see again; -)

FF FF FF FF Presents HeaderId

01 00 00 00 Presents MajorVersion

00 00 00 00 presents MinorVersion

in the BinaryLibrary:

As indicated, each record should begin with RecordTypeEnumeration . At the end of the last recording, we must assume that a new one begins.

Let's interpret the following byte:

As we can see, in our SerializationHeaderRecord example, the BinaryLibrary entry BinaryLibrary :

The BinaryLibrary entry associates the INT32 identifier (as described in [2.2.22] MS-DTYP] with the library name. This allows other entries to reference the library name using the identifier. This approach reduces wire size when there are multiple entries that reference the same and same library name.

It consists of:

RecordTypeEnum (1 byte)
LibraryId (4 bytes)
LibraryName (variable number of bytes ( LengthPrefixedString ))

As stated in 2.1.1.6 LengthPrefixedString ...

LengthPrefixedString is a string value. The string has a UTF-8 encoded string length prefix in bytes. The length is encoded in a variable-length field with a minimum of 1 byte and no more than 5 bytes. To minimize wire size, the length is encoded as a variable length field.

In our simple example, the length is always encoded using 1 byte . With this knowledge, we can continue to interpret the bytes in the stream:

0C represents a RecordTypeEnumeration that identifies a BinaryLibrary record.

02 00 00 00 represents LibraryId , which is 2 in our case.

Now LengthPrefixedString follows:

42 represents LengthPrefixedString information that contains a LibraryName .

In our case, information about the length of 42 (decimal 66) tells us that we need to read the next 66 bytes and interpret them as LibraryName .

As already mentioned, the UTF-8 string is encoded, so the result of the bytes above will be something like this: _WorkSpace_, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

ClassWithMembersAndTypes:

And the recording is complete again, so we interpret the RecordTypeEnumeration following:

05 identifies the ClassWithMembersAndTypes entry. Section 2.3.2.1 ClassWithMembersAndTypes states:

The ClassWithMembersAndTypes entry is the most verbose of the Class entries. It contains metadata about members, including the names and types of deleted items. It also contains a library identifier that references the name of the class library.

It consists of:

RecordTypeEnum (1 byte)
ClassInfo (variable number of bytes)
MemberTypeInfo (variable number of bytes)
LibraryId (4 bytes)

ClassInfo:

As stated in 2.3.1.1 ClassInfo , an entry consists of:

ObjectId (4 bytes)
Name (variable number of bytes (again, LengthPrefixedString ))
MemberCount (4 bytes)
MemberNames (which is a sequence of LengthPrefixedString , where the number of elements MUST be equal to the value specified in the MemberCount field.)

Back to the original data, step by step:

01 00 00 00 represents the ObjectId . We already saw this, it was listed as RootId in SerializationHeaderRecord .

0F 53 74 61 63 6B 4F 76 65 72 46 6C 6F 77 2E 41 represents the Name class, which is represented using LengthPrefixedString . As already mentioned, in our example, the length of the string is determined with 1 byte, so the first byte 0F indicates that 15 bytes should be read and decoded using UTF-8. The result looks something like this: StackOverFlow.A - so I used StackOverFlow as the namespace name.

02 00 00 00 represents MemberCount , it tells us that 2 members will follow, both of which are represented by LengthPrefixedString .

First Member Name:

1B 3C 53 6F 6D 65 53 74 72 69 6E 67 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the first MemberName , 1B is again a string length of 27 bytes, which leads to something like this : <SomeString>k__BackingField .

Second Member Name: <T411>

1A 3C 53 6F 6D 65 56 61 6C 75 65 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the second MemberName , 1A indicates that the string length is 26 bytes. This leads to something like this: <SomeValue>k__BackingField .

MemberTypeInfo:

ClassInfo followed by MemberTypeInfo .

Section 2.3.1.2 - MemberTypeInfo indicates that the structure contains:

BinaryTypeEnums (variable in length)

A sequence of BinaryTypeEnumeration values that represents the passed member types. Array MUST:
Have the same number of elements as the MemberNames field of the ClassInfo structure.
We will arrange so that the BinaryTypeEnumeration matches the name of the member in the MemberNames field of the ClassInfo structure.

Additionally BinaryTpeEnum (variable in length), depending on BinaryTpeEnum additional information may or may not be.

| BinaryTypeEnum | AdditionalInfos |
|----------------+--------------------------|
| Primitive | PrimitiveTypeEnumeration |
| String | None |

Therefore, given this, we are almost there ... We expect 2 BinaryTypeEnumeration values (because MemberNames had 2 members).

MemberTypeInfo go back to the source data of the full MemberTypeInfo record:

01 represents the BinaryTypeEnumeration first member, according to 2.1.2.2 BinaryTypeEnumeration can be expected a String , and it is represented using the LengthPrefixedString .

00 represents the BinaryTypeEnumeration second element, and, again, according to the specification, this is Primitive . As stated above, Primitive followed by additional information, in this case a PrimitiveTypeEnumeration . Therefore, we need to read the next byte, which is 08 , compare it with the table specified in 2.1.2.3 PrimitiveTypeEnumeration , and be surprised that we can expect Int32 , which is represented by 4 bytes, as indicated in some other document on basic data types.

LibraryId:

After MemerTypeInfo follows LibraryId , it is represented by 4 bytes:

02 00 00 00 represents LibraryId , which is 2.

Values:

As stated in 2.3 Class Records :

The values of class members MUST be serialized as records that follow this record, as described in section 2.7. The order of entries MUST match the order of MemberNames, as specified in the ClassInfo structure (section 2.3.1.1).

That is why we can now expect member values.

Let's look at the last few bytes:

06 identifies a BinaryObjectString . It represents the value of our SomeString property ( <SomeString>k__BackingField , to be precise).

According to 2.5.7 BinaryObjectString it contains:

RecordTypeEnum (1 byte)
ObjectId (4 bytes)
Value (variable length represented as LengthPrefixedString )

Therefore, knowing this, we can clearly determine that

03 00 00 00 represents the ObjectId .

03 61 62 63 represents Value , where 03 is the length of the string itself, and 61 62 63 are the bytes of the content, which translate to abc .

I hope you remember that there was a second member, Int32 . Knowing that Int32 is represented using 4 bytes, we can conclude that

must be the Value our second member. 7B hexadecimal equivalent of 123 decimal characters, which apparently matches our example.

So here is the complete ClassWithMembersAndTypes entry:

MessageEnd:

Finally, the last byte 0B represents the MessageEnd record.

+4

Markus safar Nov 04 '16 at 10:08

source share

If I'm not mistaken, the binary serializer unloads some information about the object type name and namespace. If these values differ from the original type of the class and your new "SomeDataFormat", this may explain the difference in size.

Have you tried comparing two files with a hex editor?

+3

pdriegen Aug 1 '13 at 14:41

source share

When you do deserialization, some thing will be awesome. for instance

 public class SomeClass() { public short SomeProperty {get;set;} }

will deserialize on

 public class SomeClass() { public long SomeProperty {get;set;} }

But if you serialize the second SomeClass (i.e. the one with the long one), this will result in a different size, which will result in the serialization of SomeClass with a short one. In this particular case, 6 bytes.

Update:

Desert into a shared object, and then use reflection to get types. You may have to do recursion and special handling of a complex object.

 using (var fileStream = new FileStream("TestFormatter.dat", FileMode.Open)) { var binaryFormatter = new BinaryFormatter(); var myObject = binaryFormatter.Deserialize(fileStream); var objectProperties = myObject.GetType().GetProperties(); foreach (var property in objectProperties) { var propertyTypeName = property.PropertyType.Name; //This will tell you the property Type Name. Ie string, int64 (long) } }

+2

cgotberg Aug 1 '13 at 15:01

source share

Other inconsistencies may be due to the lack of attributes of your class. Try the following:

 [StructLayout(LayoutKind.Sequential, Pack=1)] public class SomeDataFormat // 16 field { ...

+1

gordy Aug 12 '13 at 6:41

source share

Will · Accepted Answer · 2013-08-09T14:43:10+0000

Why are the values in a different order in the two files?

This is because the order of members is not based on streamlining the declaration. http://msdn.microsoft.com/en-us/library/424c79hc.aspx

The GetMembers method does not return members in a specific order, for example, in alphabetical order or in declaration order. Your code should not depend on the order in which members are returned, as that order changes.

.

Why are the extra 3 bytes compared to two serialized objects?

First, TypeFormat 'TypesWhenNeeded' should actually be 'TypesAlways'. That is why there are so many differences. For example, because of this, 05 after "= null" becomes 03.

Secondly, you do not have the correct types. Looking at the BinaryFormatter in ILSpy, and the hexadecimal dump shows that the members marked as "int" are actually "strings".

 public class SomeDataFormat // 16 field { public string Name { get; set; } public string Country { get; set; } public string UserEmail{ get; set; } public bool IsCaptchaDisplayed{ get; set; } public bool IsForgotPasswordCaptchaDisplayed{ get; set; } public bool IsSaveChecked{ get; set; } public string SessionId{ get; set; } public string SelectedLanguage{ get; set; } public string SelectedUiCulture{ get; set; } public string SecurityImageRefId{ get; set; } public string LogOnId{ get; set; } public bool BetaLogOn{ get; set; } public string Amount{ get; set; } public string CurrencyTo{ get; set; } public string Delivery{ get; set; } public bool displaySSN{ get; set; } }

What am I missing? How can i do this?

I see no way to do this with a given BinaryFormatter. You can decompile / change the way BinaryFormatter works.

Double deserialization without object definition

More articles: