Are compiler optimizers safe?

I recently discovered at work that a policy does not use compiler optimization for hard embedded real-time systems due to the risk of compiler errors (we mainly use gcc, but the policy extends to other compilers). Apparently, this policy started because someone was burned in the past by an error with the optimizer. I feel this is overly paranoid, so I started looking for data on this problem, but the problem is that I cannot find any hard data about it.

Does anyone know how to get this data type? Can the gcc bugzilla page be used to generate some error statistics and compiler optimization level? Is it even possible to obtain such objective data?

+4
source share
6 answers

I have no data (and have not heard of anyone who does ...), but ...

I would choose which compiler I will use before I want to turn off optimization. In other words, I would not use any compiler that I could not trust in optimization.

The linux kernel compiled with -O. This is much more convincing for me than any bugzilla analysis.

Personally, I will be fine with any version of gcc linux fine with.

As another data point, Apple was turning from gcc to llvm with and without clang. Llvm has traditionally had problems with some C ++, and while llvm-gcc is now much better, clang ++ is still having problems. But this just proves the pattern: while Apple (presumably) now compiles OS X and iOS with clang, they don’t use much if any C ++ and Objective C ++. So for pure C and Objective C, I would trust clang, but I still don't trust clang ++.

+1
source

Is using a compiler safe?

The compiler, at its discretion, converts your code to another form. Usually it should convert it correctly, but, like all software, an error can be detected there. So it is not safe.

What can make the code safe?

Testing / Usage.

To detect errors, the code containing them must be executed in a specific configuration. For any non-trivial software, it is almost impossible to prove the absence of errors, however, hard tests and heavy use, as a rule, at least clear some of the execution paths.

So how can I be safe?

Good, using the same paths as everyone else. This gives you the best chance that the path is error-free, with all the people who have already passed there.

For gcc then? I would use -O2 or -Os (like Linux) because they probably got a huge amount of control, directly or indirectly.

Should optimization be enabled?

However, introducing optimizations into the tool chain is disruptive. This requires more than just flip switching. You need to do heavy testing to make sure that nothing bad happens in your environment.

More specifically, compilers rely on undefined behavior to perform a number of optimizations. If your code has never undergone optimization, then most likely it will rely on such undefined behavior here and there, and the optimization turn can reveal these errors (do not enter them).

This is no more destructive than switching compilers.

+2
source

You assume that the compiler works without errors without errors, and only optimization is dangerous. Compilers themselves are programs and very often have errors with or without certain functions. Sure that functions can make it better, or they can make it worse.

Llvm is mentioned in another answer, there is a well-known llvm optimization error which seems to have zero interest in committing

 while(1) continue; 

optimizes, just leaves ... sometimes ... and other similar, but not completely infinite loops also disappear in the llvm optimizer. Leaving you with a binary that does not match your source code. This is the one I know, there are probably many others in gcc and llvm compilers.

gcc is a monster that is barely held together with duct tape and wire. It’s like watching one of these faces death films or something like that, once you get these images in your head once, you cannot pull them out, they are burned there for life. Therefore, it is worth finding out how scary gcc is, looking behind the curtain. But you may not be able to forget what you saw. For various purposes, -O0 -O1 -O2-O3 can fail all the time with a bang with some code at some point in time. Similarly, a fix should once optimize no less.

When you write a program, the hope is that the compiler does what it says, but just as you hope your program does what you say. But this is not always the case, your debugging does not end when the source code is perfect, it ends when the binary is perfect, and which includes any binary and operating system that you hope to configure (different minor versions of gcc make different binary files , different Linux targets respond differently to programs).

The most important tip is development and testing using a target optimization level. if you develop and test, always creating a debugger, it’s good that you have a program that works in the debugger, you can start all over again when you want it to work somewhere else. gcc-O3 works often, but people are afraid of it, and it does not get enough use for proper debugging, so it is not so reliable. -O2 and no optimization -O0 get a lot of mileage, lots of bug reports, lots of corrections, choose one of them or as another answer, go with what uses Linux. Or go with what uses firefox or uses what uses chrome.

Now hard embedded real-time systems. Human mission systems, systems in which life or property are directly affected. First, why are you using gcc? Secondly, yes, optimizers are often not used in these environments, this creates too much risk and / or significantly increases testing and verification efforts. Usually you want to use a compiler that has gone through many tests, and its warts and traps are well known. Do you want to be the one who turned on the optimizer, and as a result, the flight computer crashed the plane to elementary school on a school day? You can learn a lot from old timers. yes, they have a lot of military stories and a lot of fear of new deceived things. do not repeat the story, learn from it. "They do not build, as they are used to," then this is not just a saying. these outdated systems were stable and reliable and still work for some reason, partly those old timers and what they learned, and partly because new things are built cheaper and have lower quality components.

For this class of environment, you definitely do not stop at the source code, your money and time are poured out in confirmation of BINARY. Every time you change the binary, you need to start checking again. It is no different from the hardware on which it works, you change one component, warm up one soldered connection, you again start checking the verification from the very beginning. One of the differences, perhaps, is that in some of these environments, only a maximum number of cycles is allowed for each solder joint before you discard the entire block. But this can take place in the software, there are only so many burn cycles on the graduation brochure before you refuse the prom, and only so many processing cycles on the graduation gaskets / holes before you break the circuit board / block. Leave the optimizer turned off and find a better, more stable compiler and / or programming language.

Now, if this harsh real-time environment doesn’t hurt people or property (except that it works) when it works, that’s another story. Maybe his player is with a blue ray, and he skips the frame here and there or displays a few bad pixels, a big deal. Turn on the optimizer, the masses no longer care about this level of quality, they are content with youtube quality images, compressed video formats, etc. Cars that need to be turned off and on again for radio or Bluetooth to work. One bit does not bother them, turn on the optimizer and demand better performance than your competitor. If the software is too bad to endure customers, they will work around it or just buy someone if it fails, they will come back to you and buy your new model with the new firmware. They will continue to do this because they want to dance baloney, they do not want stability and quality. This stuff costs too much.

You must collect your own data, try the software optimizers in your environment, and run the product with the full test suite. If it does not interrupt, either the optimizer for this code is fine that day, or the test system needs more work. If you cannot do this, you can at least parse and analyze what the compiler does with your code. I would suggest (and know from personal experience) that both gcc and llvm error systems have errors tied to optimization levels, does this mean that you can sort them based on the optimization level? I don’t know, this is an open, mostly uncontrolled interface, so you cannot rely on the masses to accurately and completely define input fields, if there was an optimization field in the error report form, it is probably always set by default for the form / web pages. You must examine the problem report yourself to see if the user has problems with the optimizer. If it was a closed system for the corporation, where the analysis of employee performance could be negatively reflected for the wrong procedure, for example, filling out forms, you would have better searchable databases for obtaining information.

The optimizer increases the risk. Let's say that 50% of the compiler is used to get the result without optimization, another 10% to get -O1, you increased your risk, used more compiler code, more risk of having an error, more risk when you exit badly, and more code used to go to - O2 and -O3. Decreasing optimization does not completely eliminate the risk, but reduces the chances.

+1
source

In my understanding, most compiler optimizations are safe for all programs, with the exception of scheduling and reordering optimizations. Because this type of optimization can change the original behavior of the program.

For data on this issue, you can check:

Can compiler optimization introduce errors?

0
source

I do not know about gcc errors, but the C programming language is not suitable for the current hardware. Remember that it was conceived in 1970, when it was not even clear that the 2 arithmetic of additions had to be future. Well, you add 2 unsigned integers to C. The specs say you are not allowed to overflow. The compiler may assume that the carry flag becomes clear after adding and makes further optimizations based on this. You accept 2 additions of arithmetic (and whoever these days) and you boom you just set the carry flag. Such things are a major source of security issues. I think Java is probably better even for low-level code, because it is much better defined, and the current HotSpot only in time compiler creates code that works just as fast C. You can also look at the D programming language, probably it well defined too.

0
source

In embedded systems, it is often necessary to control equipment by writing registers. In C, it is very simple, just initialize the pointer with the register address and leave.

If no other part of the program reads or writes a case, it is likely that the optimizer will delete the destination. Code violation.

This particular problem can be fixed using the "volatile" keyword. But do not forget that the optimizer also changes the sequence of commands. Therefore, if your equipment expects registers to be written in a specific order, you may be burned.

It is assumed that the optimizer will give the correct results, but the intermediate steps may change, and this may damage the optimizer.

0
source

Source: https://habr.com/ru/post/1393687/


All Articles