What happened to the historical typedef soup for integers in C programs?

This is perhaps a question that should probably be known.

Fifteen years ago or so, a lot of C code that I would look at had a lot of integer typedefs in platform #ifdef s. It seems that every program or library that I looked at had its own, incompatible soup soup. I didn't know much about programming at the time, and it seemed like a fancy bunch of hoops to slip through, just telling the compiler which integer you wanted to use.

I put together a story in my mind to explain what these typedef are, but I really don't know if this is true. I assume that basically, when C was first developed and standardized, it was not realized how important it was to be able to platform - independently get an integer type of a certain size, and therefore, all the original C-integer types can be of different sizes on different platforms . Thus, anyone trying to write portable C code had to do it themselves.

It is right? If so, how did programmers expect to use C-integer types? I mean, in a low-level language with a lot of bit-squeaks, isn’t it important to say "this is a 32-bit integer"? And since the language was standardized in 1989, probably the idea was that people would try to write portable code?

+48
c
Apr 14 '17 at 7:39 on
source share
6 answers

When C started, computers were less homogeneous and much less connected than today. It was seen that for portability, it is more important that int types are the natural sizes for the computer. Requiring a precisely 32-bit integer type in a 36-bit system is likely to result in inefficient code.

And then a ubiquitous network appeared in which you work with certain size fields by wire. Now interoperability looks completely different. And the "octet" becomes the actual quantum of data types.

Now you need ints of exact multiples of 8 bits, so now you get typedef soup, and then, in the end, the standard catches up, and we have standard names for them, and the soup is not needed.

+67
Apr 14 '17 at 8:05
source share
Previously, success was due to the flexibility of adapting to almost all existing architecture options @John Hascall with: 1) own integer sizes 8, 16, 18, 24, 32, 36, etc. Bit,
2) varieties with a sign of integers: 2 additions, 1 addition, a signed integer and 3) various finite, large, small and others .

As coding developed, algorithms and data exchange were advanced for more uniformity and, therefore, for types that corresponded to 1 and 2 above the platforms. Encoders rolled as typedef int int32 inside #if ... Many variations that created the soup, as noted by the OP.




C99 introduced (u)int_leastN_t, (u)int_fastN_t, (u)intmax_t to make portable, but somewhat minimal types of bit widths. These types are necessary for N = 8,16,32,64.

Also introduced are semi-optional types (see below **), such as (u)intN_t , which has additional attributes, they must be 2 additions and without filling. It is these popular types that are so widely desired and used to grind whole soup.




how did programmers expect to use C-integer types?

Writing flexible code that does not depend much on bit width. It is fairly easy to encode strtol() using only LONG_MIN, LONG_MAX without taking into account the encoding for the width of the bits / endian / integer.

However, many coding tasks require precise width types and 2 extras for simple high-performance coding. It is better in this case to abandon portability to 36-bit machines and 32-bit values ​​of signed values ​​and stick to 2 N wide (2 additions for signed) integers. Various CRC and cryptographic algorithms and file formats come to mind. Thus, the need for fixed-width types and a predefined (C99) way to do this.




Today there are still gotchas that still need to be managed. Example. Conventional int/unsigned promotions lose control, as these types can be 16, 32, or 64.




**

These types are optional. However, if an implementation provides integer types with a width of 8, 16, 32, or 64 bits, without padding bits and (for signed types) that have a representation with two additions, it must specify the corresponding typedef names. C11 7.20.1.1 Integer types of exact width 3

+18
Apr 14 '17 at 16:23
source share

I remember this period, and I am guilty of doing the same!

One question was the size of an int , it can be the same as short , or long or between them. For example, if you worked with binary file formats, it was necessary that everything be aligned. Byte organizes complex things. Many developers went along a lazy route and simply did fwrite , instead of allocating numbers separately by byte. When the cars were updated to longer word lengths, all hell broke. So typedef was easy to crack to fix it.

If performance was a problem, as it often was then, int guaranteed that the machine would be the fastest in natural size, but if you needed 32 bits and int was shorter than this, you were in danger of rollover.

In the C language, sizeof() should not be allowed at the preprocessor stage, which complicated the situation because you could not do #if sizeof(int) == 4 , for example.

Personally, some of the reasons also simply worked with the thinking of assembly language and did not want to abstract from the concept of what short , int and long mean. Assembler was often used in C. at that time.

Currently, there are many non-binary file formats, JSON, XML, etc., where it does not matter what a binary representation is. In addition, many popular platforms are designed for 32-bit int or longer, which is usually enough for most purposes, so there are fewer rollover problems.

+11
Apr 14 '17 at 13:06 on
source share

C is a product of the early 1970s when the computing ecosystem was completely different. Instead of millions of computers talking to each other over an extended network, you might have had one hundred thousand systems around the world, each of which ran several monolithic applications, with little or no communication between the systems. You could not assume that any two architectures had the same word sizes or represented integers with a sign in the same way. The market was still small enough so that there was no need for standardization, computers did not talk to each other (a lot), and no one, although they really cared about portability.

If so, how did programmers expect to use C-integer types?

If you want to write as portable code as possible, then you did not expect anything other than what the standard guaranteed. In the case of int this meant that you did not assume that it could represent anything outside the range [-32767,32767] , and you did not assume that it would be presented in 2 additions, and you did not assume that it was specific (it can be wider than 16 bits, but still only represents a 16-bit range if it contains any padding bits).

If you do not care about portability, or do things that are inherently not portable (usually a bit-plexus), you used all types that satisfy your requirements.

I did mostly high-level application programming, so I was less worried about presentation than about range. However, I sometimes needed to plunge into binary representations, and he always bit me in the ass. I remember writing code in the early 90s that was supposed to run on classic MacOS, Windows 3.1, and Solaris. I created many enumeration constants for 32-bit masks that worked fine on Mac and Unix blocks, but could not compile them in a Windows window, because on Windows the int size was only 16 bits.

+5
Apr 14 '17 at 19:04 on
source share

C was developed as a language that can be ported to as many machines as possible, and not as a language that would allow most types of programs to run without changes on such a range of machines. For most practical purposes, types C were:

  • An 8-bit type, if available, or the smallest type that contains at least 8 bits.

  • The 16-bit type, if available, or the smallest type that contains at least 16 bits.

  • A 32-bit type, if available, or some type that has at least 32 bits.

  • A type that will be 32 bits if systems can handle things as efficiently as 16-bit, or 16 bits otherwise.

If the code needs 8, 16 or 32-bit types and is unlikely to be used on machines that did not support them, there was no particular problem with such code with respect to char , short and long as 8, 16 and 32 bits, respectively. The only systems that did not display these names for these types were those that could not support these types and could not usefully handle the code that required them. Such systems will be limited to writing code that was written to be compatible with the types that they use.

I think that C is perhaps best viewed as a recipe for translating system specifications into language dialects. A system using 36-bit memory may not actually be able to process the same language dialect effectively like a system using memory on an octet, but a programmer who studies one dialect will be able to learn another simply by knowing which whole representations the latter uses . It is much more useful to tell a programmer who needs to write code for a 36-bit system: "This machine is similar to other machines except char - 9 bits, short - 18 bits, and long - 36 bits" than to say: "You must use assembly language because other languages ​​will need integer types that this system cannot handle efficiently. "

+2
Apr 14 2018-02-17T00:
source share

Not all cars have the same native word size. Although you might be tempted to think that a smaller variable will be more efficient, it just isn't. In fact, using a variable size of the same size as the native word size for the processor is much faster for arithmetic, logical, and bit manipulation operations.

But what, in fact, is the "native word size"? Almost always, this means that the size of the CPU register, which matches the Arithmetic Logic Unit (ALU), can work with.

In embedded environments, there are still things like 8 and 16-bit processors (are there still 4-bit PICs?). There are mountains of 32-bit processors. Thus, the concept of "native word size" is alive and well for C. developers.

With 64-bit processors, support for 32-bit operands is often supported. In practice, using 32-bit integers and floating point values ​​can often be faster than the full word size.

In addition, there are trade-offs between the original word alignment and overall memory consumption when layouting C.

But two common usage patterns remain: agnostic size code to improve speed (int, short, long) or fixed size (int32_t, int16_t, int64_t) for correctness or compatibility where necessary.

+1
Apr 14 '17 at 17:35
source share



All Articles