1) Before starting a project, create a way to measure the amount of memory used, preferably based on each component. Thus, every time you make changes, you can see its effects on memory usage. You cannot optimize what you cannot measure.
2) If the project has already matured and falls within the memory limits (or is transferred to a device with less memory), find out what you are already using the memory for.
My experience is that almost all significant optimization when fixing a large application depends on a small number of changes: it reduces the size of the cache, highlights some textures (of course, this is a functional change that requires agreement with interested parties, that is, meetings, so there may be ineffective in terms of your time), convert the sound, reduce the initial size of the heap allocated, find ways to free resources that are only used temporarily, and per unload them when required again. Sometimes you will find some structure, which is 64 bytes, which can be reduced to 16 or something else, but these are rarely the worst results. If you know what the largest lists and arrays in the application are, then you know what structures should look first.
Oh yes: find and fix memory leaks. Any memory that you can recover without sacrificing performance is a great start.
I spent a lot of time in the past worrying about code size. Key considerations (except: make sure you measure it during assembly so you can see it):
1) Find out which code is referenced, and by what. If you find that the whole XML library is bound to your application in order to parse a two-element configuration file, consider changing the format of the configuration file and / or writing your own trivial analyzer. If you can, use either source or binary analysis to draw a large dependency graph and look at the large components with a small number of users: perhaps this can be cut off with only minor code rewrites. Be prepared to play diplomat: if two different components in your application use XML, and you want to crop it, then you must convince these two people of the benefits of manually rewinding what is now a reliable, ready-made library.
2) Separate the compiler options. Consult your specific platform. For example, you can reduce the allowable default increase in code size due to nesting, and in GCC, at least you can tell the compiler to only apply optimizations that usually don't increase code size.
3) If possible, use libraries already on the target platform (s), even if this means writing an adapter layer. In the above XML example, you may find that your target platform always has an XML library in memory because the OS uses it, in which case it dynamically refers to it.
4) As already mentioned, thumb mode can help in ARM. If you use it only for code that is not performance critical and leave the critical procedures in ARM, you will not notice the difference.
Finally, there may be smart tricks that you can play if you have enough control over the device. Does the user interface let you run one application at a time? Unload all drivers and services that your application does not need. Double buffering screen, but does your application synchronize with the update cycle? You can restore the entire screen buffer.