Should a buffer with bytes be signed or unsigned char buffer?

Question

Should a buffer with bytes be signed or unsigned char buffer?

Should a char or unsigned char byte buffer be signed, or just a char buffer? Any differences between C and C ++?

Thank.

+49

c ++ c char buffer

jackhab Mar 17 '09 at 7:52

source share

14 answers

If you are going to store arbitrary binary data, you should use unsigned char . This is the only data type that is guaranteed to have no complement bits according to the C standard. Each other data type may contain padding bits in its object representation (that is, one that contains all the bits of the object, not just those that determine the value) . The state of the fill bits is not defined and is not used to store values. Therefore, if you read some binary data using char , everything will be reduced to a range of char values (by interpreting only the bits of the value), but there can still be bits that are simply ignored but still exist and read memcpy . Like complementary bits in real structure objects. The unsigned char type is guaranteed to not contain. This follows from 5.2.4.2.1/2 (C99 TC2, n1124 here):

If the value of an object of type char is considered a signed integer when used in the expression, the value of CHAR_MIN must be the same as the value of SCHAR_MIN , and the value of CHAR_MAX must be the same as the value of SCHAR_MAX . Otherwise, the value of CHAR_MIN must be 0, and the value of CHAR_MAX must be the same as the value of UCHAR_MAX . UCHAR_MAX must be 2^CHAR_BIT − 1

It follows from the last sentence that there is no space left for any padding bits. If you use char as the type of your buffer, you also have an overflow problem: assigning any value explicitly to one of these elements, which is in the range of 8 bits, so you can expect such an assignment to be in order - but not in the range a char , which is equal to CHAR_MIN .. CHAR_MAX , such an overflow of transitions and leads to the implementation of certain results, including increasing signals.

Even if any of the problems associated with the above, probably will not be displayed in real implementations (this will be a very poor quality of implementation), it is best to use the correct type from the very beginning, and this is unsigned char .

For strings, however, the data type is char , which will be understood by string and print functions. Using signed char for these purposes seems like the wrong solution for me.

For more information, read this proposal , which contains a fix for the next version of the C standard, which will eventually require a signed char There are also any bits of extras. It is already included in the working paper.

+46

Johannes Schaub - litb Mar 17 '09 at 11:53

source share

It depends.

If the buffer is for storing text, then it probably makes sense to declare it as a char array, and let the platform decide for you whether it is signed or unsigned by default. This will give you at least problems transferring data to and from the runtime library.

If the buffer is designed to hold binary data, it depends on how you intend to use it. For example, if the binary data is indeed a packed array of data samples that have signed 8-bit fixed-point ADC measurements, then signed char would be better.

In most cases, in the real world, a buffer is just that, a buffer, and you really don't care about the types of individual bytes, because you fill the buffer in a bulk operation, and you are going to pass it on to turn off the parser to interpret the complex data structure and do something useful. In this case, declare it in the simplest way.

+12

RBerteig Mar 17 '09 at 8:03

source share

If this is actually a buffer of 8 bits of bytes, and not a string in the machine’s standard locale, I would use uint8_t . Not that there are many machines where char is not a byte (or octet byte), but making the statement “this is an octet buffer” rather than “this is a string” is often useful documentation.

+9

Pete Kirkham Mar 17 '09 at 9:49

source share

You should use either char or unsigned char, but never subscribe char. The standard is as follows in 3.9 / 2

For any object (except the subobject of the base class) of type POD T, whether the object has a valid value of type T underlying the bytes (1.7) that make up the object can be copied to an array from char or unsigned char. If the contents of an array of char or unsigned char equals are copied back to the object, the object will subsequently be the original value.

+5

Richard Corden Mar 17 '09 at 11:08

source share

It is better to define it as an unsigned char. Infact Win32 type BYTE is defined as unsigned char. There is no difference between C and C ++.

+4

Naveen Mar 17 '09 at 8:01

source share

For maximum portability always use unsigned char. There are several cases where this may come into play. Semialized data shared between systems with different endian types immediately comes to mind. When shifting or masking bits, the values are different.

+3

MrEvil Mar 17 '09 at 10:05

source share

Choosing int8_t vs uint8_t is similar to choosing when you compare ptr as NULL.

In terms of functionality, comparing with NULL is the same as comparing with 0, because NULL is #define for 0.

But personally, in terms of coding style, I decided to compare my pointers with NULL, because NULL #define points to the person supporting the code that you are checking for a bad pointer ...

VS

when someone sees a comparison with 0, it means that you are checking a specific value.

For this reason, I would use uint8_t.

+2

Trevor Boyd Smith Mar 17 '09 at 14:44

source share

If you select an element in a wider variable, it will of course expand with expansion or not.

0

pngaz Mar 17 '09 at 7:55

source share

Must and should ... I tend to prefer unsigned, as it feels more "raw", less attractive, to say "hey, it's just a bunch of little ints " if I want to emphasize binary data information.

I don't think I've ever used an explicit signed char to represent a byte buffer.

Of course, one third option is to represent the buffer as void * as much as possible. Many common I / O functions work with void * , so sometimes the decision about which type of integer to use can be completely encapsulated, which is nice.

0

unwind Mar 17 '09 at 8:01

source share

A few years ago, I had a problem with a C ++ console application that printed colored characters for ASCII values above 128, and this was solved by switching from char to unsigned char, but I think it was available for solution, keeping char too .

Currently, most C / C ++ functions use char, and now I understand that both languages are much better, so I use char in most cases.

0

schnaader Mar 17 '09 at 8:03

source share

Don't you care? If you do not, just use the default value (char) and do not clutter your code with an irrelevant question. Otherwise, it will be interesting for future attendants to find out why you used signed (or unsigned). Make your life easier.

0

Gorpik Mar 17 '09 at 8:06

source share

 typedef char byte;

Now you can make your byte s array. This is obvious to everyone that you had in mind, and you are not losing any functions.

I know this is somewhat stupid, but it makes your code read 100% as you planned.

0

Matt Cruikshank Mar 17 '09 at 15:10

source share

If you lie to the compiler, it will punish you.

If the buffer contains data that simply passes, and you will not manipulate it in any way, it does not matter.

However, if you need to work with the contents of the buffer, then the correct type declaration will make your code simpler. No "int val = buf [i] and 0xff;" nonsense.

So, think about what the data really is and how you need to use it.

-one

Darron Mar 17 '09 at 14:57

source share

dan04 · Accepted Answer · 2011-02-20 15:46

Should a char byte or unsigned char buffer be signed, or just a char buffer? Any differences between C and C ++?

A slight difference in how the language relates to it. A huge difference is how the agreement handles it.

char = ASCII (or UTF-8, but the signature interferes there) text data
unsigned char = byte
signed char = rarely used

And there is code that relies on such a difference. Just a week or two ago, I ran into an error when the JPEG data was corrupted because it was being transferred to the char* version of our Base64 encoding function - which “usefully” replaced all invalid UTF-8s in the “string”. Switching to BYTE aka unsigned char is all that is needed to fix it.

Should a buffer with bytes be signed or unsigned char buffer?

More articles: