What is the "data label" function in x86 assembler?

I am currently studying assembly programming, following the Kip Irvine book on x86 assembly programming.

In the book, the authors try to explain the concept of data label

The data label identifies the location of the variable, providing a convenient way to reference the variable in code. For example, defines a variable called count:

 count DWORD 100 

Assembler assigns a numeric address for each label.

So, my understanding of what the data label does: the data label count is a variable that contains a numerical value, where the numerical value is a location in memory. When I use count in my code, I actually use the value contained in this place in memory, in this case 100.

Do I understand the data label correctly? If this is somewhat incorrect, can someone please point out the error?

+5
source share
1 answer

Shortcuts are a symbolic way to write memory addresses, nothing more, nothing more. The shortcut itself does not take up space, and this is just a convenient way to let you later refer to this place in memory.

(Well, they can also be converted to characters in the object file to allow numerical addresses to be computed at the time of reference rather than at build time. But for labels defined and mentioned in a single file, this additional complexity is mostly invisible, see below about addresses that are bind time constants, not build time.)

eg.

 ; NASM syntax, but the concepts apply exactly to MASM as well ; For MASM, you may need BYTE PTR or whatever size overrides in loads. section .rodata ; or section .data if you want to be able to store here, too. COUNT: db 0x12 FOO: db 0 BAR: dw 0x80FF ; same as db 0xff, 0x80 

A 4-byte load, such as mov eax, [COUNT] , will get 0x80FF0012 (since x86 is not very similar). A 2-byte load from FOO , like mov cx, [FOO] , will get 0xFF00.

In fact, you can use overlapping loads from a constant this way, for example. with strings where some are substrings of others. For null-terminated strings, this way you can combine only common suffixes into a single repository.


Now does this mean COUNT is a 4-byte variable or a 1-byte variable? No no. Assembler language does not actually have "variables".

Variables are a higher-level concept that you can implement in assembly language with a label and an assembly directive that reserves some static space . Note that these labels are separate from the db directives in the example above.

But the variable should not have any static storage space: for example. your loop counter variable can (and often should) exist only in a register.

A variable does not even require one fixed location. It can be pushed onto the stack in the part of the function in which it is not used, but it is in the register in another part of the function. In code generated by the compiler, variables often move between registers for no reason, because compilers do not even try to use the same register for the same variable.


Note that MASM implicitly associates the label with the size of the operand according to the directive that follows it. Therefore, you may need to write mov eax, dword ptr [count] if mov eax, [COUNT] indicates an operand size mismatch error.

Some consider this feature, but others believe that this magical material with the size of the operands is absolutely strange. NASM syntax does not have such magic. You can tell how the line will be assembled, without having to go and find where the labels are defined. add [count], 1 is an error in NASM because nothing implies the size of the operand.

Do not get hung up on the fact that everything you use for the C variable in C should have static storage with a label in your assembler programs . But if you want to use the term "variable" for a static data store + shortcut such as Kip Irvine, then go ahead.


Also note that data labels are not special or different from code labels. Nothing prevents you from writing jmp COUNT . Decoding 12 00 FF 80 as a (sequence) of x86 instructions remains as an exercise for the reader, but (if it is on the page with permission to execute), it will be extracted and decoded by the CPU.

Similarly, nothing prevents you from loading data from code labels as a memory operand. It’s usually a good idea to combine code and data for performance reasons (all processors use shared L1D and L1I caches), but this also works. In a typical OS (such as Linux), the text segment of an executable file contains sections of code data and read-only data and maps to read and execute permissions. (But not write permission, so trying to save will be to blame if you didn't change the permission.)

The JIT compiler writes the machine code to the buffer and then jumps there. It can be a static buffer with a label, but most often it will be a dynamically allocated buffer whose address is a variable.


Static addresses are usually binding time constants, but often are not assembly time constants. (If you are not writing a bootloader or anything else that is specifically loaded at a known address, org 0x100 may be useful.) This means that you can do mov al, [COUNT+2] but not mov al, [COUNT*2] . (Object file formats support integer offsets, but not other mathematical operators).

In PIC code, label addresses are not even anchor time constants, but at least in 64-bit PIC, the offset from code to data label is an anchor time constant, so RIP-relative addressing can be used without an additional level of indirection (via the global table displacements).

+5
source

Source: https://habr.com/ru/post/1269186/


All Articles