Space optimization instead of speed in C ++

When you say “optimization,” people tend to think “speed.” But what about embedded systems where speed is not so important, but memory is the main limitation? What are some recommendations, methods, and tricks that you can use to extract extra kilobytes in ROM and RAM? How is one “profile” code to see where the memory swells?

PS It can be argued that the “premature” optimization of space in embedded systems is not all evil, because you leave yourself more space for data storage and creep functions. It also helps reduce hardware manufacturing costs because your code can run on smaller ROM / RAM.

PPS Links to articles and books are also welcome!

PPPS These questions are closely related: 404615 , 1561629

+42
c ++ optimization embedded
Jan 17 '10 at 21:31
source share
16 answers

My experience from an extremely limited internal memory environment:

  • Use fixed size buffers. Do not use pointers or dynamic allocation because they have too much overhead.
  • Use the smallest int data type that works.
  • Never use recursion. Always use a loop.
  • Do not miss a lot of function parameters. Use global variables instead. :)
+30
Jan 17 '10 at 21:49
source share

There are many things you can do to reduce traces of memory, I am sure that people have written books on this subject, but some of them:

  • Compiler options to reduce code size (including -Os options and packing / alignment options)

  • Linker Options for Dead Code Removal

  • If you are loading flash (or ROM) into the playoffs for execution (rather than executing flash memory), use a compressed flash image and unpack it using the bootloader.

  • Use static allocation: heap is an inefficient way to allocate limited memory, and if it can fail due to fragmentation, if it is limited.

  • Tools for finding the watermark of the stack (as a rule, they fill the stack with a template, execute a program, and then see where the template is located), so you can optimally set the size of the stack

  • And, of course, optimization of the algorithms used to store memory (often due to speed)

+13
Jan 17 '10 at 21:43
source share

Some obvious

  • If speed is not critical, execute the code directly from the flash.
  • Declare persistent data tables with const . This will avoid copying data from flash memory to RAM.
  • Pack big data tables tightly using the smallest data types and in the correct order to avoid filling.
  • Use compression for large data sets (if the compression code does not outweigh the data)
  • Disable exception handling and RTTI.
  • Has anyone mentioned the use of -O ?; -)

Adding knowledge to data

One rule of Unix philosophy can help make code more compact:

The rule of representation: turn knowledge into data, so program logic can be stupid and reliable.

I can’t calculate how many times I have seen complex branching logic spanning many pages that could be put into a beautiful compact table of rules, constants and function pointers. State machines can often be represented in this way (state template). A team template is also applied. It's all about declarative and imperative programming styles.

Log codes + binary data instead of text

Instead of writing plain text, log event codes, and binary data. Then use the phrasebook to recreate event messages. Messages in the phrasebook may even contain printf format specifiers, so that the values ​​of the event data are displayed clearly in the text.

Minimize the number of threads

Each thread needs its own memory block for the stack and TSS. If you don’t need prejudice, consider having your tasks run together in a single thread ( shared multitasking ).

Use memory pools instead of storage

To avoid heap fragmentation, I often saw that individual modules accumulate large static memory buffers for their own use, even when memory is only needed occasionally. Instead, you can use a memory pool, so memory is only used on demand. However, this approach may require careful analysis and tools to ensure that pools are not exhausted at runtime.

Dynamic allocation only during initialization

On embedded systems, where only one application works, you can use dynamic allocation in a reasonable way, which does not lead to fragmentation: just dynamically allocate once in your various initialization procedures and never free up memory. reserve() your containers in the correct capacity and do not let them grow automatically. If you often need to allocate / free data buffers (for example, for communication packets), then use memory pools. Once I even increased the C / C ++ runtime to interrupt my program if something tries to dynamically allocate memory after an initialization sequence.

+11
Jan 18 '10 at 0:24
source share

Generate a map file from the linker. It will show how memory is allocated. This is a good start when optimizing memory usage. It will also show all the functions and how the code space is laid out.

+7
Jan 17 '10 at 22:23
source share

As with all optimization, first optimize the algorithms, then optimize the code and data, finally optimize the compiler.

I do not know what your program does, so I can not consult with the algorithms. Many others wrote about the compiler. So, here are some tips for code and data:

  • Eliminate code redundancy. Any repeated code that is three or more lines long, repeated three times in the code must be changed to a function call.
  • Eliminate the redundancy of your data. Find the most compact view: combine read-only data and consider using compression codes.
  • Run the code through a regular profiler; exclude all code that is not in use.
+7
Jan 17 '10 at 22:40
source share
+5
Jan 18
source share

Compile to VS with / Os. Often this is even faster than speed optimization, as the smaller code == = less paging.

Comdat addition must be enabled in the linker (by default this is done in versions)

Be careful with the packaging of the data structure; often this leads to the compiler generating more code (== more memory) in order to generate an assembly to access unaudited memory. Using 1 bit for a boolean flag is a classic example.

Also, be careful when choosing an efficient memory algorithm for the algorithm with the best execution time. This is where premature optimizations happen.

+4
Jan 17 '10 at 21:46
source share

It’s good that most of them have already been mentioned, but here is my list:

  • Find out what your compiler can do. Read the compiler documentation, experiment with code examples. Check the settings.
  • Check the generated code at the target optimization level. Sometimes the results are surprising and often it turns out that optimization actually slows down the work (or just takes up too much space).
  • Select the appropriate memory model . If you aim for a really small, rigid system, a large or huge memory model may not be the best choice (but, as a rule, simplifies programming for ...)
  • Prefers static placement . Use dynamic allocation only at startup or a more static buffer (static buffer with pool or maximum instance).
  • Use data types C99 . Use the smallest sufficient data type for storage types. Local variables, such as loop variables, are sometimes more efficient with fast data types.
  • Select inline candidates. Some parameters of heavy function with relatively simple organs are better when they are embedded. Or consider the passage of a parameter structure. Globals are also an option, but be careful - tests and maintenance can become difficult if any of them are not well versed.
  • Use the const keyword, remember the consequences of array initialization.
  • Map file , ideally also with module sizes. Check also what is included in crt (is it really necessary?).
  • Recursion just say no (limited stack space)
  • Floating point numbers - prefer fixed-point math. As a rule, it includes and calls a lot of code (even for simple addition or multiplication).
  • C ++ , you should know C ++ VERY GOOD. If you do not, program the embedded systems in C, please. Those who dare should be careful with all the advanced C ++ constructs (inheritance, patterns, exceptions, overloading, etc.). Think that closer to the HW code, rather, Super-C and C ++ are used where it counts: in high-level logic, a graphical interface, etc.
  • Disable everything that you do not need in the compiler settings (be it library parts, language constructs, etc.).

Last but not least, when looking for the smallest possible code size, don't overdo it . Also beware of performance and maintainability. Over-optimized code tends to decay quickly.

+3
Jan 19 '10 at 13:15
source share

First, tell your compiler about the need to optimize code size. GCC has the -Os flag for this.

Everything else is at the algorithm level - use similar tools that you would like to find memory leaks, but instead find the allocations and releases that you could avoid.

Also pay attention to the often used packaging of the data structure - if you can shave a byte or two from them, you can significantly reduce the amount of memory used.

+2
Jan 17
source share

Profiling code or bloating data can be done through map files: for gcc see here , for VS see here .
I have yet to see a useful tool for profiling sizes (and I do not have time to fix the VS AddIn hack).

+2
Jan 17
source share

If you're looking for a good way to profile the heap usage of your application, check out the valgrind massif tool. This will allow you to take snapshots of your application’s memory usage profile over time, and then you can use this information to better see where the “low hanging fruit” is and to direct your optimizations accordingly.

+2
Jan 17 '10 at 22:41
source share

above what others suggest:

Limit the use of C ++ functions, write as in ANSI C with small extensions. Standard templates (std: :) use a large dynamic allocation system. If possible, do not use templates at all. Although not malicious, they make it too easy to generate many, many machine code in just a couple of simple, clean, elegant, high-level instructions. This encourages writing in such a way that, despite all the benefits of "clean code," it is very hungry.

If you must use templates, write your own or use ones that are intended to be used as built-in ones, pass fixed sizes as template parameters and write a test program so that you can test your template and check your -S output to make sure the compiler is not creating horrible assembler code to create it.

Align your structures manually or use the #pragma pack

 {char a; long b; char c; long d; char e; char f; } //is 18 bytes, {char a; char c; char d; char f; long b; long d; } //is 12 bytes. 

For the same reason, use a centralized global data storage structure instead of scattered local static variables.

Intelligent balancing using malloc () / new and static structures.

If you need some of the functionality of this library, consider writing your own.

Expand the short loops.

 for(i=0;i<3;i++){ transform_vector[i]; } 

more than

 transform_vector[0]; transform_vector[1]; transform_vector[2]; 

Do not do this for longer ones.

Pack multiple files together to compile short short compiler functions and perform various optimizations. Linker can't.

+2
Jan 19 '10 at 12:10
source share

Don't be afraid to write "small languages" inside your program. Sometimes the row table and interpreter can get a LOT. For example, in the system I was working on, we have many internal tables that need to be accessed in different ways (looping, whatever). We have an internal command system for referencing tables that form a kind of language halfway, which is quite compact for what the don gets.

But, BE CAREFUL! Know that you write such things (I wrote one by chance, I myself) and the DOCUMENT that you do. The original developers did not seem to be aware of what they were doing, so it’s much more difficult to manage than it should be.

+1
Jan 19 '10 at 17:00
source share

Optimization is a popular term, but often technically incorrect. This literally means making it optimal. This state is never achieved either in speed or in size. We can just take steps to move on to optimization.

Many (but not all) methods used to move to the minimum time for the computational result sacrifice the need for memory, and many (but not all) methods used to move to the minimum memory requirement increase the time to the result.

Reducing memory requirements is a fixed number of common methods. It is difficult to find a specific technique that does not fit neatly into one or more of them. If you did all of them, you would have something very close to the minimum space requirement for the program, if not for the absolute minimum. For a real application, this may require a team of experienced programmers over a thousand years.

  • Remove all redundancy from stored data, including intermediate data.
  • Delete all data necessary for storage, which may be streaming.
  • Select only the number of bytes needed, not more.
  • Delete all unused data.
  • Remove all unused variables.
  • Free data as soon as it is no longer possible.
  • Remove all unused algorithms and branches within the algorithms.
  • Find the algorithm that is represented in the minimum execution unit.
  • Remove any unused space between items.

This is a computer look at the topic, not the developer.

For example, packaging a data structure is an effort that combines (3) and (9) above. Data compression is a way to at least partially achieve (1) above. Reducing the overhead of higher-level programming constructs is a way to achieve some progress in (7) and (8). Dynamic allocation is an attempt to use a multitasking environment for use (3). Compilation warnings, if enabled, can help with (5). Destructors try to help (6). Sockets, flows, and pipes can be used to perform (2). Simplification of a polynomial is a method of obtaining the basis in (8).

Understanding the meaning of nine and various ways to achieve them is the result of many years of training and verification of memory cards obtained as a result of compilation. Embedded programmers often recognize them faster due to limited available memory.

Using the -Os option in the gnu compiler makes a request to the compiler to try to find patterns that can be converted to execute them, but -O is an aggregate flag that includes a number of optimization functions, each of which tries to perform conversions to perform one out of 9 tasks above.

Compiler directives can produce results without a programmer’s efforts, but automated processes in the compiler rarely fix problems that arise due to a lack of awareness in code writers.

+1
Feb 21 '17 at 7:58
source share

Consider the cost of implementing some C ++ functions, such as virtual function tables and overloaded operators that create temporary objects.

0
Jan 18 '10 at 13:15
source share

Along with what everyone else said, I just wanted to add, do not use virtual functions, because VTable must be created with virtual functions, which can take, who knows how much space.

Also watch out for exceptions. With gcc, I don’t believe that for every try-catch block the size grows (with the exception of 2 call functions for each try-catch), but there is a fixed-size function that needs to be connected, in which there may be a waste of precious bytes

0
Jan 19 '10 at 16:41
source share



All Articles