Another question about C type

Well, I completely get the most basic C data types, such as short, int, long, float, or rather, all numeric types. These types must be known in order to perform the correct operations with the correct numbers. For example, to use FPU to add two floating point numbers. Therefore, the compiler must know what a type is.

But when it comes to characters, I turn off a little. I know that a basic char data type exists to encode ASCII characters. But I don’t know why you even need a different type of data for the characters. Why couldn't you use only a single-byte integer value to store the ASCII character. If you call printf, you determine the data type in the call, so you can tell printf that the integer represents the ASCII character. I don't know how cout resolves a data type, but I think you could somehow specify it.

Another thing, when you want to use Unicode, you have to use datatype wchar. But what if I wanted to use some other, like ISO, or Windows encoding instead of UTF? Becouse wchar encodes characters as UTF-16 or UTF-32 (I read its specific compiler). And what if I want to use, for example, some kind of imaginary new 8-byte text encoding? What type of data should I use for this? I'm actually very confused by this because I always expected that if I want to use UTF-32 instead of ASCII, I just tell the compiler "get the UTF-32 value of the character I entered and save it in the 4 char field." I thought that text encoding should be processed to the end, for example, the print function. That I just need to specify the encoding for the compiler, because Windows doesent uses ASCII in win32 applications, I assume that the C compiler should convert the char that I typed to ASCII from what the type sends Windows to the C editor.

And the last thing, what if I want to use, for example, 25 byte integers for some high mathematical operations? C does not have a specific data type. Yes, I know that it will be difficult, since all the mathematical operations will need to be changed, because the CPU cannot add 25 bytes together. But is there a way to do this? Or is there a math library for this? What if I want to calculate Pi up to 1,000,000,000,000,000 digits? :)

I know that my question is quite long, but I just wanted to explain my thoughts in the best possible way in English, since it is difficult for his non-native language. And I believe that there is a simple answer to my question (s), something that I missed explains this all. I read a lot about text coding, C tutorials, but nothing about it. Thank you for your time.

+4
source share
4 answers

Your question is very broad, I will try to solve some specific problems that you raised, I hope it will make you understand more.

  • The char type can be at least one more numerical type, like int, short and long. It is normal to write char a=3; . The difference is that with char compiler gives you some added value. instead of primes, you can also assign ASCII characters to a variable of type char a='U'; and then the variable will get the ASCII value of this character, and you can also initialize the character arrays using string letters: char *s="hello"; .
    This does not change the fact that in the end, char is still a numeric type, and the string is just an array of numbers. If you look at line memory, you will see ASCII line codes.

  • Choosing a char , which is 1 byte, is arbitrary and is basically stored in C this way for historical reasons. more modern languages, such as C # and Java, define char as 2 bytes.

  • You do not need a “different” type for characters. char is only a numeric type that contains a single single / unsigned byte, the same as short is a numeric type that contains a signed 16-bit word. The fact that this data type is used for characters and strings is just the syntactic sugar provided by the compiler. 1 byte integers == char .

  • printf() only works with characters, since it was designed as C. It was designed today, it will probably work with shorts. Indeed, on Windows you have a version of printf() that works with shorts, it is called wprintf()

  • the wchar_t type, in windows, is just another name for short . somewhere in the Windows header files there is such an announcement: typedef short wchar_t; why this is happening. You can use them interchangeably. The advantage of using the word wchar_t is that whoever reads your code knows that now you want to use characters, not numbers. Another reason is that if there is a chance that Microsoft will someday decide that they now want to use UTF32, then all they need to do is override the typedef above to be typedef int wchar_t; and what is it (in fact it will be quite difficult to achieve this, this change is unlikely in the expected future)

  • If you want to use some 8-bit encoding, which is not ASCII, for example, the Hebrew encoding, which is called "Windows-1255," you just use the characters. There are many such encodings, but these days using UNICODE is always preferable. Indeed, in fact there is a version of Unicode itself that fits into 8-bit strings, which are UTF-8. If you are dealing with UTF-8 strings, you must work with the char data type. There is nothing to limit his work with ASCII, as it is just a number, it can mean anything.

  • Work with such long numbers is usually performed using the so-called decimal types. C does not have this, but C # does. The basic idea of ​​these types is that they process a number that looks like a string. Each decimal digit is stored using 4 bits, so an 8-bit variable can store numbers in the range 0-99, a 3-byte array can store values ​​in the range 0-999999, etc. Thus, you can save numbers of any range.
    The disadvantage of this is that calculations on them are much slower than calculations on normal binary numbers.
    I'm not sure if there are libraries that do such things in C. Use google to find out.

+2
source

In fact, there are many languages ​​where variable types arent known at compile time. This usually adds some overhead at runtime.

To answer your first question, I think you hanged yourself in the name "char". The char type is a single-byte integer in C (in fact, this is not entirely true - it is an integral type large enough to contain any character from the basic character set, but its size depends on the implementation.) Note that you can have as signed characters and unsigned characters, which doesn't make much sense if you're talking about a data type that contains only characters. But a single byte integer is called a "char" in C because it is the most common use for it (again, see the Disclaimer above).

The rest of your question covers a lot of questions - perhaps it would be better to break it down into several questions. Like the char type, the size of wchar_t is implementation dependent, the only requirement is that it is large enough to hold any wide character. It is important to understand that Unicode and character encodings are generally independent of C. It is also important to understand that character sets are not the same as character encodings.

Here is an article (one of the founders of SO, I believe) that gives a brief introduction to character sets and encodings: http://www.joelonsoftware.com/articles/Unicode.html , Once you better understand how they work, you can it’s better to formulate some questions for yourself. Note that for multiple character sets (such as the Windows codepage), only one byte of memory is required.

+1
source

In C, char is a 1 byte integer, and it is also used to store a character. A character is only 1 byte integer in C.

And what if I want to use for example some imaginary new 8-byte text encoding?

You will have to build it yourself based on the types available through your compiler / equipment. One approach may be to define a structure with an array of 8 characters and build a function for maniuplate the specified structure with all the operations you would like on it,

becouse I always expected that if I want to use UTF-32 instead of ASCII, I just tell the compiler "get the value of the UTF-32 character, which I printed and saved in the 4 char field.

You are limited to the types of your C compiler, which is heavily influenced by hardware (and the C + standard has a bit of history). C is a low level language and does not provide much magic. However, there are library functions that can allow you to translate between (some) character sets, for example. the function mbtowc() , etc., which does just that, you say "here are 16 bytes of ISO8859-1 characters, translate them in UTF-16 to this buffer there for me".

And the last thing, what if I want to use, for example, 25 byte integers for some high math operations? C does not have a native data type.

C allows you to define your own data types, structs. You can build an abstraction on top of them. People have libraries like this, see, for example, here . Other languages ​​may allow you to more naturally model types such as C ++, which also allow you to override operators like +, -, *, etc. To work with your own data types.

+1
source

There is (was not) a 1-byte integer type other than char (and its signed and unsigned variants). And although Windows NT (i.e. Not 9x or ME) does not use Unicode internally, your program will use Unicode only if you write it like this: you need to either use WCHAR or all W versions of win32 calls or use TCHAR and #define UNICODE .

0
source

Source: https://habr.com/ru/post/1307089/


All Articles