Assembly Coding Standards / Best Practices

Question

Assembly Coding Standards / Best Practices

I know the 8086 Assembly, and now I'm learning the MIPS Assembly by reading the books " Programming in MIPS Assembly and See MIPS Run," but I never thought about the coding standards / best practices of Assembly. I want to turn me into a better developer every day, and then I want to know this in order to improve myself. How can I find out more about assembly coding standards and best practices?

+5

assembly coding-style mips

Nathan campos Jan 31 '10 at 14:35

source share

2 answers

A good asm style is pretty universal for all ISAs (and different asm dialects for the same processor). Compiler output (e.g. gcc / clang) usually does all the things that I mention below, so this is a good guide. (And the output of the C compiler is often a good starting point for optimizing a small function.)

As a rule, indentation is one level deeper than labels and assembly directives.

Indentation of operands for a consistent column (therefore, different mnemonics do not leave your code uneven, and it is easy to scan in a block and see the destination register of each instruction as the first operand) ¹ .

Indent the line comment for the sequential column on the right, far beyond the operands, to avoid visual noise.

Group the blocks of related instructions together with an empty string to separate them. (Or, if you optimize the CPUs in order by scheduling instructions, you cannot do this and should use comments to keep track of which part of the problem each instruction is working on. Using different levels of indentation for comments can be useful then)

Footnote 1:
Except for the MIPS repository instructions, such as sw $t0, 1234($t1) where the first operand is actually the source; they decided to make the asm source use the same operand order for both downloads and storages, possibly because they are both I-type instructions in machine code. This is typical of asm for RISC boot / storage architectures, so something you need to get used to comes from CISC, where mov eax, [rdi] is load and mov [rdi], eax is storage. And add [rdi], eax is both.

Example: atoi function for unsigned integers, for real MIPS with branch delay intervals. But not MIPS I, nor slots with delayed downloads. Although I still tried to avoid stalls with a load. ( Godbolt for version C )

 # unsigned decimal ASCII string to integer # inputs: char* in $a0 - ASCII string that ends with a non-digit character # outputs: integer in $v0 # clobbers: $t0, $t1 atoi: # peel the first iteration to avoid a 0 * 10 multiply lbu $v0, 0($a0) addiu $v0, $v0, -'0' # digit = *p - '0' sltu $t0, $v0, 10 bnez $t0, .Lloop_entry # if unsigned (! digit<10) nop # doing work for the next iteration here hurts ILP for in-order CPUs #addu $t2, $v0, $v0 # total * 2 (branch delay slot) # invalid non-digit input jr $ra # return 0 move $v0, $zero .Lloop: # do { addu $v0, $v0, $v0 # total *= 2 addu $t0, $t0, $t1 # total*8 + digit addu $v0, $v0, $t0 # total*10 + digit = total*2 + (total*8 + digit) .Lloop_entry: lbu $t0, 1($a0) addui $a0, $a0, 1 # t0 = *(p++ + 1) addiu $t0, $t0, -'0' # t0 = digit sltu $t1, $t0, 10 bnez $t1, .Lloop # while(digit<10); sll $t1, $v0, 3 jr $ra nop

This is probably not optimal for any particular MIPS implementation; a superscalar in order would probably benefit from placing more shifts / additions between the load and the branch, even if it means that the last iteration is doing more redundant work. This is probably good for OoO exec like r10k. Modern MIPS32r6 will use lsa to accumulate a left-shift, as gcc does with -march=mips32r6 , and will use versions of branch instructions without delaying the branch.

This can be pretty good on early scalar MIPS, though. The increment of the pointer fills the slot after loading, avoiding stopping inside the loop. (Immediate bias 1 is due to the fact that we avoided increasing the purified first iteration).

Filling the delay interval for the launch branch before .Lloop_entry would be possible if we wanted to calculate more material for the next iteration after addu $v0, $v0, $t0 inside the main loop. But that would require a dependency on $v0 , which would hurt ILP for superscalar processors in order. (Currently, top to addu can run in parallel, then addu can work in parallel with lbu to create a new total.)

This would be good for scalar order (e.g. MIPS I / MIPS II) or for idle processors.

0

Peter Cordes Jun 09 '19 at 17:29

source share

martinwguy · Accepted Answer · 2010-01-31T14:58:20+0000

The best practice is a social phenomenon, depending on the society in which you will work, so the best answer is to read the existing asm MIPS code from any environment with which you intend to interact.

Examples that come to mind from my own world are the Linux kernel assembler sections, the GCC MIPS startup code, or the MIPS glibc port assembler fragments.

If you primarily interact with other projects, it is best to absorb and imitate the coding practice of this community.

Assembly Coding Standards / Best Practices

More articles: