ORG directive
When you specify an ORG directive, such as ORG 0x0000
at the top of your assembler program, and use BITS 16
, you tell NASM that when labels are allowed on the code and data, absolute offsets will be generated that will be generated at the initial offset specified in the ORG (16 -bit code will be limited to an offset equal to WORD / 2 bytes).
If you have an ORG 0x0000
at the beginning and place the start:
label at the beginning of the code, start
will have an absolute offset of 0x0000. If you use ORG 0x7C00
, then the start
label will have an absolute offset of 0x7c00. This applies to any data labels and code labels.
We can simplify your example to find out what happens in the generated code when working with a data variable and a hard-coded character. Although this code does not exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
An example of using ORG 0x0000 :
BITS 16 ORG 0x0000 start: push cs pop ds ; DS=CS push 0xb800 pop es ; ES = 0xB800 (video memory) mov ah, 0x0E ; AH = Attribute (yellow on black) mov al, byte [msg] mov [es:0x00], ax ; This should print letter 'P' mov al, byte [msg+1] mov [es:0x02], ax ; This should print letter 'A' mov al, 'O' mov [es:0x04], ax ; This should print letter 'O' mov al, '!' mov [es:0x06], ax ; This should print letter '!' cli hlt msg: db "PA" ; Bootsector padding times 510-($-$$) db 0 dw 0xAA55
If you ran this on VirtualBox, the first 2 characters would be garbage, and O!
should be displayed correctly. I will use this example, leaving this answer.
VirtualBox / CS: IP / segment: offset pairs
In the case of Virtual Box, it will effectively execute the FAR JMP equivalent to 0x0000: 0x7c00 after loading the boot sector at the physical address 0x00007c00. FAR JMP (or equivalent) will not only go to the specified address, it sets CS and IP to the specified values. FAR JMP up to 0x0000: 0x7c00 will set CS = 0x0000 and IP = 0x7c00.
If someone is not familiar with calculations beyond the 16-bit segment: offsets and how they map to the physical address, then this document is a good enough starting point for understanding the concept. The general equation for obtaining the physical memory address from a 16-bit segment is: pair offset (segment<<4)+offset = 20-bit physical address
.
Since VirtualBox uses CS: IP 0x0000: 0x7c00, it will start executing code at the physical address (0x0000 <4) + 0x7c00 = 20-bit physical address 0x07c00. Remember that this is not guaranteed in all environments. Due to the nature of the segments: offset pairs, there is more than one way to refer to the physical address 0x07c00. See the section at the end of this answer on how to handle it correctly.
What happens to your bootloader?
Assuming we use VirtualBox, and the information given in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 when entering our bootloader. If we take a sample code (using ORG 0x0000
), I wrote in the first section of this answer and look at the parsed information (I will use the objdump output), we would see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin 00000000 <.data>: 0: 0e push cs 1: 1f pop ds 2: 68 00 b8 push 0xb800 5: 07 pop es 6: b4 0e mov ah,0xe 8: a0 24 00 mov al,ds:0x24 b: 26 a3 00 00 mov es:0x0,ax f: a0 25 00 mov al,ds:0x25 12: 26 a3 02 00 mov es:0x2,ax 16: b0 4f mov al,0x4f 18: 26 a3 04 00 mov es:0x4,ax 1c: b0 21 mov al,0x21 1e: 26 a3 06 00 mov es:0x6,ax 22: fa cli 23: f4 hlt 24: 50 push ax ; Letter 'P' 25: 41 inc cx ; Letter 'A' ... 1fe: 55 push bp 1ff: aa stos BYTE PTR es:[di],al
Since ORG information is lost when building into a binary, I use --adjust-vma=0x0000
so that the first column of values โโ(memory address) starts with 0x0000. I want to do this because I used ORG 0x0000
in the assembler source code. I also added some comments in the code to show where our data section is (and where the letters P
and A
were placed after the code).
If you run this program in VirtualBox, the first two characters will come out as gibberish. So why is that? First, remember that VirtualBox reached our code by setting CS at 0x0000 and IP at 0x7c00. Then this code copied CS to DS:
0: 0e push cs 1: 1f pop ds
Since the CS was zero, the DS is zero. Now look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24
is actually the encoded address for the msg variable in our data section. A byte with an offset of 0x24 has a value of P
in it (0x25 has A
). You can see where everything can go wrong. Our DS = 0x0000, so mov al,ds:0x24
really matches mov al,0x0000:0x24
. This syntax is invalid, but I replace DS 0x0000 to make a point. 0x0000:0x24
where our code at runtime will try to read our letter P
from. But wait! This is the physical address (0x0000 <4) + 0x24 = 0x00024. This memory address is at the bottom of the memory in the middle of the interrupt vector table. It is clear that this is not what we planned!
There are several ways to solve this problem. The simplest (and preferred method) is to actually place the correct segment in the DS, rather than relying on what CS can be when our program is running. Since we set the ORG to 0x0000, we need to have a data segment (DS) = 0x07c0. Segment: offset pair 0x07c0: 0x0000 = physical address 0x07c00. This is the address of our bootloader. So, all we need to do is change the code, replacing:
push cs pop ds ; DS=CS
FROM
push 0x07c0 pop ds ; DS=0x07c0
This change should provide the correct output on startup in VirtualBox. Now let's see why. This code has not changed:
8: a0 24 00 mov al,ds:0x24
Now that DS = 0x07c0 is running. That would be like saying mov al,0x07c0:0x24
. 0x07c0:0x24
, which translates to a physical address (0x07c0 <4) + 0x24 = 0x07c24. This is what we want, since our bootloader is physically placed in the BIOS, starting from this point, and therefore it must correctly reference our msg variable.
The moral of the story? What you use for ORG should be the applicable value in the DS register when starting our program. We must install it explicitly and not rely on what is in CS.
Why print immediate values?
With the source code, the first 2 characters printed gibberish, but the last two did not. As discussed in the previous section, there was a reason the first character was not printed, but what about the last two characters?
Let's take a closer look at the 3rd character of O
:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an instantaneous (constant) value and moved it to the AL register, the character itself is encoded as part of the instruction. It does not rely on memory access through the DS register. Because of this, the last 2 characters are displayed.
Ross Ridge's offer and why it works in VirtualBox
Ross Ridge suggested using the ORG 0x7C00
, and you noticed that it worked. Why did this happen? And is that the perfect solution?
Using my very first example and change ORG 0x0000
to ORG 0x7C00
and then assemble it. objdump
would provide this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin boot.bin: file format binary Disassembly of section .data: 00007c00 <.data>: 7c00: 0e push cs 7c01: 1f pop ds 7c02: 68 00 b8 push 0xb800 7c05: 07 pop es 7c06: b4 0e mov ah,0xe 7c08: a0 24 7c mov al,ds:0x7c24 7c0b: 26 a3 00 00 mov es:0x0,ax 7c0f: a0 25 7c mov al,ds:0x7c25 7c12: 26 a3 02 00 mov es:0x2,ax 7c16: b0 4f mov al,0x4f 7c18: 26 a3 04 00 mov es:0x4,ax 7c1c: b0 21 mov al,0x21 7c1e: 26 a3 06 00 mov es:0x6,ax 7c22: fa cli 7c23: f4 hlt 7c24: 50 push ax ; Letter 'P' 7c25: 41 inc cx ; Letter 'A' ... 7dfe: 55 push bp 7dff: aa stos BYTE PTR es:[di],al
VirtualBox sets CS to 0x0000 when it jumps to our bootloader. Our source code then copied CS to DS, so DS = 0x0000. Now let's see what the ORG 0x7C00
directive ORG 0x7C00
for our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we now use the offset 0x7c24! It will look like mov al,0x0000:0x7c24
, which is the physical address (0x0000 <4) + 0x7c24 = 0x07c24. This is the correct memory location in which the bootloader is loaded, and is the correct position of our msg line. So it works.
ORG 0x7C00
using an ORG 0x7C00
bad idea? Not. This is normal. But we have a subtle question that can be fought. What happens if another virtual computer environment or real hardware does not use FAR JMP for our bootloader using CS: IP 0x0000: 0x7c000? It is possible. There are many physical PCs with BIOS that are actually equivalent to down-jumping at 0x07c0:0x0000
. This is also the physical address 0x07c00
, as we have already seen. In this environment, when our code runs CS = 0x07c0. If we use source code that copies CS to DS, DS now also has 0x07c0. Now look what will happen to this code in this situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS = 0x07c0 in this scenario. Now we have something similar to mov al,0x07c0:0x7c24
, when the program is actually executing. Uh, that looks bad. What does this mean as a physical address? (0x07c0 <4) + 0x7c24 = 0x0F824. This is somewhere above our bootloader, and it will contain everything that happens after the computer boots. Probably zeros, but this should be considered garbage. Obviously our msg line was not loaded!
So how do we resolve this? To make corrections to what Ross Ridge suggested and to heed the advice I gave earlier on explicitly installing DS in the segment we really want (don't assume that CS is correct and then blindly copy to DS), we should put 0x0000 in DS when our bootloader starts if we use ORG 0x7C00
. Therefore, we can change this code:
ORG 0x7c00 start: push cs pop ds ; DS=CS
in
ORG 0x7c00 start: xor ax, ax ; ax=0x0000 mov ds, ax ; DS=0x0000
Here we do not rely on an unreliable value in CS. We simply set the DS to a segment value, which makes sense given the ORG used. You could press 0x0000 and pull it into DS, as you did. I'm more used to resetting the register and moving it to DS.
Taking this approach, it does not matter what value in CS could be used to reach our loader, the code will still refer to the appropriate memory location for our data.
Do not assume that the 1st step is invoked by the BIOS with CS: IP = 0x0000: 0x7c00
In my general bootloader tips that I wrote in StackOverflow's previous answer, itโs very important to make tip # 1:
- When the BIOS goes to your code, you cannot rely on CS, DS, ES, SS, SP registers having real or expected values. They must be configured properly at bootloader startup. You can guarantee that your bootloader will be loaded and started from the physical address 0x07c00 and that the boot disk number will be loaded into the DL register.
The BIOS may have JMP'ed FAR (or equivalent) to our code with jmp 0x07c0:0x0000
, and some emulators and real hardware do this this way. Others use jmp 0x0000:0x7c00
, as VirtualBox does.
We must take this into account by explicitly specifying DS what we need and set it to what makes sense for the value that we use in our ORG directive.
Summary
Do not assume that CS is our expected value and do not blindly copy CS to DS. Install DS explicitly.
Your code can be fixed to use either ORG 0x0000
, as you originally used it, if we assign DS 0x07c0 accordingly, as discussed earlier. It might look like this:
ORG 0 BITS 16 push word 0xB800 ; Address of text screen video memory in real mode for colored monitors push 0x07c0 pop ds ; DS=0x07c0 since we use ORG 0x0000 pop es
Alternatively, we could use ORG 0x7C00
as follows:
ORG 0x7c00 BITS 16 push word 0xB800 ; Address of text screen video memory in real mode for colored monitors push 0x0000 pop ds ; DS=0x0000 since we use ORG 0x7c00 pop es