Optimization for ARM: why do different CPUs affect different algorithms differently (and dramatically)

I did some tests for code performance on Windows mobile devices and noticed that some algorithms on some hosts significantly improve performance and much worse than others. Of course, given the difference in clock speed.

Statistics for the link (all results are generated from the same binary file compiled using Visual Studio 2005 ARMv4 targeting):

Intel XScale PXA270

  • Algorithm A: 22642 ms
  • Algorithm B: 29271 ms

ARM1136EJ-S core (integrated into the MSM7201A chip)

  • Algorithm A: 24874 ms
  • Algorithm B: 29504 ms

ARM926EJ-S core (integrated in OMAP 850 chip)

  • Algorithm A: 70215 ms
  • Algorithm B: 31652 ms (!)

, B , , FPU.

, : , , / , .

.

+3
4

, 926 (5 8 1136, iirc), 926 .

, , , , , , .

+2

- . - , . - . , , , .

- ? , ( SD-).

, ? , . 50% , . , , . , , , .

+2

, , ,

  • (, CACHE )
  • D-Cache, I-Cache

, , , , . , , , . .

+1

, - (, I-Cache). , , .

, , :

  • "" / (+ - | ^ &)
  • (, )
  • (32 )
  • (8 ) ( 32 )
  • (32 )
  • (8 )
  • - , :)

, . , . .

, VS ( ) .

p.s.: , / /? ? OS ?

0

Source: https://habr.com/ru/post/1719367/


All Articles