Dive into ARM.com and grab the Cortex-M3 datasheet . Section 3.3.1 on page 3-4 has instruction timings. Fortunately, they are pretty simple on the Cortex-M3.
It can be seen from these timings that in the ideal system “without waiting”, your professor example takes 3 cycles:
ASR R1, R0,
and your version takes two cycles:
ADD R1, R0, R0, ASR #31 ; 1 cycle EOR R0, R1, R0, ASR #31 ; 1 cycle ; total: 2 cycles
So yours is theoretically faster.
You mentioned "Delete a single memory sample", but is that true? How big are the procedures? Since we are dealing with Thumb-2, we have a combination of 16-bit and 32-bit instructions. Let's see how they are going:
Their version (taking into account UAL syntax):
.syntax unified .text .thumb abs: asrs r1, r0, #31 adds r0, r0, r1 eors r0, r0, r1
Assembled:
00000000 17c1 asrs r1, r0,
This is 3x2 = 6 bytes.
Your version (again, configured for UAL syntax):
.syntax unified .text .thumb abs: add.w r1, r0, r0, asr #31 eor.w r0, r1, r0, asr #31
Assembled:
00000000 eb0071e0 add.w r1, r0, r0, asr #31 00000004 ea8170e0 eor.w r0, r1, r0, asr #31
This is 2x4 = 8 bytes.
Therefore, instead of deleting the memory sample, you actually increased the code size.
But does this affect performance? My advice: > .