Difference between unsigned pointers char and char

Question

Difference between unsigned pointers char and char

I am a bit confused by the differences between unsigned char (which is also BYTE in WinAPI) and char pointers.

I am currently working with some legacy ATL-based code, and I see many expressions, such as:

 CAtlArray<BYTE> rawContent; CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent); return ArrayToUnicodeString(rawContent); // or return ArrayToAnsiString(rawContent);

Now ArrayToXXString implementations look like this:

 CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); // Casting from BYTE* -> LPCSTR (const char*). return CStringA((LPCSTR)copiedArray.GetData()); } CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); copiedArray.Add('\0'); // Same here. return CStringW((LPCWSTR)copiedArray.GetData()); }

So the questions are:

Is the C-style style from BYTE* to LPCSTR ( const char* ) safe for all possible cases?
Do i need to add double zero termination when converting array data to a wide character string?
The conversion procedure CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?
Any way to make all this code more understandable and maintain?

+4

c ++ char byte atl

Yippie-ki-yay Feb 10 '12 at 13:44

source share

4 answers

Yes, it is always safe. Because they both point to an array of single-byte memory locations.
LPCSTR : long pointer to constant (single-byte) String
LPCWSTR : long pointer to constant (multibyte) String
LPCTSTR : long pointer to a context- LPCTSTR constant (single-byte or multi-byte) String
In strings with a wide character, each individual character occupies 2 bytes of memory, and the length of the memory cell containing the string must be a multiple of 2. Therefore, if you want to add a wide '\ 0' to the end of the line, you must add two bytes.
Sorry for this part, I don’t know ATL, and I can’t help you in this part, but in fact I do not see any complexity here, and I think it is easy to maintain. What code do you really want to simplify for understanding and support?

+2

Mohammad dehghan Feb 10 '12 at 14:05

source share

If BYTE * behaves like a valid string (i.e. the last BYTE is 0), you can pass BYTE * to LPCSTR, yes. Functions that work with LPCSTR assume zero lines.
I think multiple zeros are only needed when working with multiple multibyte character sets. The most common 8-bit encodings (for example, regular Windows Western, as well as UTF-8) do not require them.
CString is Microsoft's best attempt at using user-friendly strings. For example, its constructor can handle char and wchar_t input, regardless of whether the CString itself is wide or not, so you don’t have to worry much about conversion.

Edit: wait, now I see that they abuse the BYTE array to store wide characters. I can not recommend this.

+1

Mr lister Feb 10 '12 at 14:07

source share

LPCWSTR is a string with 2 bytes per character, and "char" is one byte per character. This means that you cannot use it in C style because you need to configure the memory (add “0” before each ASCII standard), and not just read the data differently from memory (what C-Cast would do). So the actors are not so safe, I would say.

Double-Nulltermination: you always have 2 bytes as one character, so your "End-of-string" character should be 2 bytes in length.

To make this code easier to understand, see lexical_cast in Boost (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_lexical_cast.html)

Another way is to use std :: strings (using std :: basic_string;) and you can perform String operations.

0

Egorecords Feb 10 '12 at 14:07

source share

Swiss · Accepted Answer · 2012-02-10T14:09:47+0000

Standard C looks weird when it comes to defining a byte. However, you have a couple of guarantees.

Byte will always be char size
- sizeof (char) always returns 1
A byte will be at least 8 bits in size.

This definition is not well connected with older platforms where bytes are 6 or 7 bits long, but this means that BYTE*, and char * guaranteed to be equivalent.

Multiple zeros are required at the end of a Unicode string, because there are valid Unicode characters starting with a zero (zero) byte.

As for simplifying code reading, this is completely a matter of style. This code seems to be written in the style used by the old C Windows code, which I definitely didn't like. There are probably many ways to make it more understandable to you, but there is no clear answer to making it more clear.

Difference between unsigned pointers char and char

More articles: