Why short * instead of char * for string? Difference between char * and unsigned char *?

As the title says, I have two questions.

Edit: To clarify, they do not actually use char and short , they guarantee that they will be 8-bit and 16-bit by a specific typedef. The actual type is then called UInt8 and UInt16 .

1 question

The iTunes SDK uses unsigned short* where a string is required. What are the benefits of using it instead of char* / unsigned char* ? How to convert it to char* , and what is different when working with this type?

2. Question

I only saw char* when the string should be saved. When should unsigned char* be used, or does it make no difference?

0
source share
3 answers

unsigned short arrays can be used with wide character strings - for example, if you have UTF-16 encoded texts, although I expected to see wchar_t in these cases. But they may have their own reasons, for example, compatibility between MacOS and Windows. (If my sources are right, MacOS wchar_t is 32 bits and Windows is 16 bits.)

You convert between two types of strings by calling the appropriate library function. Which function is appropriate depends on the situation. Doesn't the SDK come with one?

And char instead of unsigned char , well, all strings have always been historically defined using char , so switching to unsigned char will lead to incompatibility.
(Switching to signed char will also cause incompatibility, but somehow not much ...)

Edit Now the question has been edited, let me say that I did not see the changes before I typed my answer. But yes, UInt16 is a better representation of a 16-bit entity than wchar_t for the above reason.

+5
source

1. Question - Answer

I would suggest that they use unsigned short * because they must use UTF-16 encoding for Unicode characters and therefore represent characters both inside and outside BMP. The rest of your question depends on the Unicode encoding type of the source and destination (UTF-8,16,32)

2. Question - Answer

Again, it depends on the type of encoding and which lines you're talking about. You should never use signed or unsigned characters if you plan to process character strings outside of an extended ASCII table. (Any other language than English)

+1
source
  • Probably a restrained attempt to use UTF-16 strings. C is of type wide character , wchar_t and its char (or wchar_t s) can be 16 bits long. Although I am not familiar with the SDK enough to say why they followed this route, it may be due to compiler issues. In C99, there is a much more suitable type [u] int [smallest / fastest] 16_t - see <stdint.h> .

    Please note that C gives very few guarantees regarding data types and their basic sizes. Signed or unsigned shorts are not guaranteed to be 16 bits (although they are guaranteed to be at least as many) and are not limited to 8 characters or wide-angle 16 or 32.

    To convert between char and short strings, you must use the conversion functions provided by the SDK. You can also write your own or use a third-party library if you knew exactly what they store in these short lines And what you need in char lines.

  • In fact, it does not matter. Usually you convert to unsigned char if you want to perform (unsigned) arithmetic or bit manipulation on a character.

Edit: I wrote (or started writing, one way or another) this answer before you told us that they used UInt16 and not unsigned short. In this case, the brains of the hare are not involved; the proprietary type is probably used for compatibility with old (or incompatible) compilers that don't have stdint types to store UTF-16 data. This is perfectly reasonable.

+1
source

Source: https://habr.com/ru/post/1402930/


All Articles