Well, obviously, you need to copy the char sign bit to each bit in the upper half. On most architectures, the easiest way is to copy the register and arithmetically shift it to the right by 7. But AVR has a shift-by-1 command , so we cannot do this effectively.
Another trick for conditionally getting 0 or -1 in a register is to subtract-borrow a register from itself to get 0 - C , for example sbc r25, r25 .
Now we just need to set the Carry flag if the 8-bit number is negative, i.e. if it is> 127, when considered as an unsigned integer, because C is always specified based on an unsigned interpretation of things. The AVR has a comparison-immediate command, CPI , but it only works for r16-r31, not low registers. In addition, he sets the C flag to the opposite of what we really want, so we would have to use another instruction to invert the result. Therefore, I think that we better compare other values with the value in the register:
; Most efficient way, I think: sign_extend: ldi r25, 127 ; can be hoisted out of loops, and any reg is fine. cp r25, r24 ; C = (r24 < 0) sbc r25, r25 ; r25 = (r24 < 0) ? -1 : 0 ; result in r25:r24
Even better, if you need to do this in a loop, you can save 127 in a different register.
With CPI, you will do the following:
; slightly worse: only works with r16-r31, and worse in loops sign_extend: cpi r24, 127 ; C = (r24 < 128U) = ((signed)r24 >= 0) sbc r25, r25 ; r25 = (r24>=0) ? -1 : 0 com r25 ; ones-complement negation: 0 : -1
Or, to avoid register restrictions, compare another method:
I have never worked with AVR, so I simply base this on the go-to guide for the instruction set found by google (and my knowledge of asm for other ISAs like x86 and ARM). According to these documents, all these instructions are 1 word (2 bytes) with 1 cycle delay. This is better than gcc4.5:
The usual way to find good command sequences is to request the AVR compiler gcc4.5 -O3 on godbolt to do this:
short sign_extend(signed char a) { return a; } sign_extend: mov r18,r24 ;; IDK why gcc uses r18 and r19. clr r19 sbrc r18,7 com r19 mov r25,r19 ret
So, zeros R19 then uses SBRC to conditionally execute logical-not ( COM ) depending on the sign bit (bit 7) of R18.
I'm not sure why additional MOVs are needed. I am also not sure why it inverts zero, and does not set all the bits without depending on the input. (for example, ldi r19, $FF or the SBR alias for it. If AVR existed due to the execution order, it would be more efficient .: P
I'm not sure what the MOV instructions are for. SBRC is non-destructive. So AFAICT, the actual implementation will be
sign_extend: clr r25 sbrc r24,7 ldi r25, $FF ret
This is even worse than CP / SBC because SBRC takes 2 cycles if skipped .
I suggest that the SBC “false dependency” on the old R25 value does not apply to AVR. On x86 processors with a default order, only AMD recognizes sbb eax, eax as independent of the old eax value and depending only on flags, Intel processors simply start it normally. (They recognize instructions like xor eax,eax as independent, and this is the standard nullifying idiom for x86 .)
So, on processors other than AMD, if the last code that EAX wrote did it with a load skipped in the cache or something else with a high delay, sbb eax, eax could not execute even if the flags were ready (i.e. e. from an independent dependency chain). But on AMD processors, it will start a new dependency chain for EAX.
In any case, I assume that AVR is a fairly simple pipeline design in order, so there is no way for the old register to be a landmine producer, unless the code that did (for example) load the cache never used the result. (Even in the pipeline, the order does not need to wait for high-latency operations until something uses the result.)