Difference between unsigned pointers char and char

I am a bit confused by the differences between unsigned char (which is also BYTE in WinAPI) and char pointers.

I am currently working with some legacy ATL-based code, and I see many expressions, such as:

 CAtlArray<BYTE> rawContent; CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent); return ArrayToUnicodeString(rawContent); // or return ArrayToAnsiString(rawContent); 

Now ArrayToXXString implementations look like this:

 CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); // Casting from BYTE* -> LPCSTR (const char*). return CStringA((LPCSTR)copiedArray.GetData()); } CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); copiedArray.Add('\0'); // Same here. return CStringW((LPCWSTR)copiedArray.GetData()); } 

So the questions are:

  • Is the C-style style from BYTE* to LPCSTR ( const char* ) safe for all possible cases?

  • Do i need to add double zero termination when converting array data to a wide character string?

  • The conversion procedure CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?

  • Any way to make all this code more understandable and maintain?

+4
source share
4 answers

Standard C looks weird when it comes to defining a byte. However, you have a couple of guarantees.

  • Byte will always be char size
    • sizeof (char) always returns 1
  • A byte will be at least 8 bits in size.

This definition is not well connected with older platforms where bytes are 6 or 7 bits long, but this means that BYTE*, and char * guaranteed to be equivalent.

Multiple zeros are required at the end of a Unicode string, because there are valid Unicode characters starting with a zero (zero) byte.

As for simplifying code reading, this is completely a matter of style. This code seems to be written in the style used by the old C Windows code, which I definitely didn't like. There are probably many ways to make it more understandable to you, but there is no clear answer to making it more clear.

+3
source
  • Yes, it is always safe. Because they both point to an array of single-byte memory locations.
    LPCSTR : long pointer to constant (single-byte) String
    LPCWSTR : long pointer to constant (multibyte) String
    LPCTSTR : long pointer to a context- LPCTSTR constant (single-byte or multi-byte) String

  • In strings with a wide character, each individual character occupies 2 bytes of memory, and the length of the memory cell containing the string must be a multiple of 2. Therefore, if you want to add a wide '\ 0' to the end of the line, you must add two bytes.

  • Sorry for this part, I don’t know ATL, and I can’t help you in this part, but in fact I do not see any complexity here, and I think it is easy to maintain. What code do you really want to simplify for understanding and support?

+2
source
  • If BYTE * behaves like a valid string (i.e. the last BYTE is 0), you can pass BYTE * to LPCSTR, yes. Functions that work with LPCSTR assume zero lines.
  • I think multiple zeros are only needed when working with multiple multibyte character sets. The most common 8-bit encodings (for example, regular Windows Western, as well as UTF-8) do not require them.
  • CString is Microsoft's best attempt at using user-friendly strings. For example, its constructor can handle char and wchar_t input, regardless of whether the CString itself is wide or not, so you don’t have to worry much about conversion.

Edit: wait, now I see that they abuse the BYTE array to store wide characters. I can not recommend this.

+1
source

LPCWSTR is a string with 2 bytes per character, and "char" is one byte per character. This means that you cannot use it in C style because you need to configure the memory (add “0” before each ASCII standard), and not just read the data differently from memory (what C-Cast would do). So the actors are not so safe, I would say.

Double-Nulltermination: you always have 2 bytes as one character, so your "End-of-string" character should be 2 bytes in length.

To make this code easier to understand, see lexical_cast in Boost (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_lexical_cast.html)

Another way is to use std :: strings (using std :: basic_string;) and you can perform String operations.

0
source

Source: https://habr.com/ru/post/1395774/


All Articles