In .net I want to decode some raw data encoded by a C ++ application. The C ++ application is 32-bit, and the C # application is 64-bit.
C ++ application supports Russian and Spanish characters , but does not support Unicode characters . This binary C # reader does not read Russian or Spanish characters and only works for English ascii characters.
CArchive does not indicate any encoding, and I'm not sure how to read it from C #.
I tested this for a couple of simple lines, this is what C ++ CArchive provides:
For "ABC": "03 41 42 43"
For "Week 7555Â": "0B C1 E5 EB C0 C7 20 37 35 35 35 C2"
The following shows how a C ++ application writes a binary file.
void CColumnDefArray::SerializeData(CArchive& Archive)
{
int iIndex;
int iSize;
int iTemp;
CString sTemp;
if (Archive.IsStoring())
{
Archive << m_iBaseDataCol;
Archive << m_iNPValueCol;
iSize = GetSize();
Archive << iSize;
for (iIndex = 0; iIndex < iSize; iIndex++)
{
CColumnDef& ColumnDef = ElementAt(iIndex);
Archive << (int)ColumnDef.GetColumnType();
Archive << ColumnDef.GetColumnId();
sTemp = ColumnDef.GetName();
Archive << sTemp;
}
}
}
And this is how I try to read it in C #.
The following can decode "ABC", but not Russian characteristics. I tested this.Encodingwith all available options (Ascii, UTF7, etc.). Russian characters work only for Encoding.Default. But, apparently, this is not a reliable option, since encoding and decoding usually occurs on different PCs.
public override string ReadString()
{
byte blen = ReadByte();
if (blen < 0xff)
{
return this.Encoding.GetString(ReadBytes(blen));
}
var slen = (ushort) ReadInt16();
if (slen == 0xfffe)
{
throw new NotSupportedException(ServerMessages.UnicodeStringsAreNotSupported());
}
if (slen < 0xffff)
{
return this.Encoding.GetString(ReadBytes(slen));
}
var ulen = (uint) ReadInt32();
if (ulen < 0xffffffff)
{
var bytes = new byte[ulen];
for (uint i = 0; i < ulen; i++)
{
bytes[i] = ReadByte();
}
return this.Encoding.GetString(bytes);
}
throw new NotSupportedException(ServerMessages.EightByteLengthStringsAreNotSupported());
}
What is the correct approach to decoding this? Do you think that choosing the right code page is a way to solve this problem? If so, how do you know which codepage was used for encoding?
Appreciate if someone can show me the right direction to do this.
Edit
" , Unicode Character Sets (No Excuses!)" . -, .
, : - , , ? ++ CArchive?