Read from memory in real mode 8086 when using 'ORG 0x0000'

I worked with the x86-16 build and ran it using VirtualBox. For some reason, when I read from memory and try to print it as a character, I get completely different results from the expected. However, when I hard code a character as part of an instruction, it works fine. Here is the code:

ORG 0 BITS 16 push word 0xB800 ; Address of text screen video memory in real mode for colored monitors push cs pop ds ; ds = cs pop es ; es = 0xB800 jmp start ; input = di (position*2), ax (character and attributes) putchar: stosw ret ; input = si (NUL-terminated string) print: cli cld .nextChar: lodsb ; mov al, [ds:si] ; si += 1 test al, al jz .finish call putchar jmp .nextChar .finish: sti ret start: mov ah, 0x0E mov di, 8 ; should print P mov al, byte [msg] call putchar ; should print A mov al, byte [msg + 1] call putchar ; should print O mov al, byte [msg + 2] call putchar ; should print ! mov al, byte [msg + 3] call putchar ; should print X mov al, 'X' call putchar ; should print Y mov al, 'Y' call putchar cli hlt msg: db 'PAO!', 0 ; Fill the rest of the bytes upto byte 510 with 0s times 510 - ($ - $$) db 0 ; Header db 0x55 db 0xAA 

The fingerprint and instructions in it can be ignored, since I have not used it yet because of the problem with which I was trying to print a character stored in memory. I built it with both FASM and NASM, and I have the same problem as my mistake.

It prints something like: Virtualbox

+1
source share
1 answer

ORG directive

When you specify an ORG directive, such as ORG 0x0000 at the top of your assembler program, and use BITS 16 , you tell NASM that when labels are allowed on the code and data, absolute offsets will be generated that will be generated at the initial offset specified in the ORG (16 -bit code will be limited to an offset equal to WORD / 2 bytes).

If you have an ORG 0x0000 at the beginning and place the start: label at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 , then the start label will have an absolute offset of 0x7c00. This applies to any data labels and code labels.

We can simplify your example to find out what happens in the generated code when working with a data variable and a hard-coded character. Although this code does not exactly perform the same actions as your code, it is close enough to show what works and what doesn't.

An example of using ORG 0x0000 :

 BITS 16 ORG 0x0000 start: push cs pop ds ; DS=CS push 0xb800 pop es ; ES = 0xB800 (video memory) mov ah, 0x0E ; AH = Attribute (yellow on black) mov al, byte [msg] mov [es:0x00], ax ; This should print letter 'P' mov al, byte [msg+1] mov [es:0x02], ax ; This should print letter 'A' mov al, 'O' mov [es:0x04], ax ; This should print letter 'O' mov al, '!' mov [es:0x06], ax ; This should print letter '!' cli hlt msg: db "PA" ; Bootsector padding times 510-($-$$) db 0 dw 0xAA55 

If you ran this on VirtualBox, the first 2 characters would be garbage, and O! should be displayed correctly. I will use this example, leaving this answer.


VirtualBox / CS: IP / segment: offset pairs

In the case of Virtual Box, it will effectively execute the FAR JMP equivalent to 0x0000: 0x7c00 after loading the boot sector at the physical address 0x00007c00. FAR JMP (or equivalent) will not only go to the specified address, it sets CS and IP to the specified values. FAR JMP up to 0x0000: 0x7c00 will set CS = 0x0000 and IP = 0x7c00.

If someone is not familiar with calculations beyond the 16-bit segment: offsets and how they map to the physical address, then this document is a good enough starting point for understanding the concept. The general equation for obtaining the physical memory address from a 16-bit segment is: pair offset (segment<<4)+offset = 20-bit physical address .

Since VirtualBox uses CS: IP 0x0000: 0x7c00, it will start executing code at the physical address (0x0000 <4) + 0x7c00 = 20-bit physical address 0x07c00. Remember that this is not guaranteed in all environments. Due to the nature of the segments: offset pairs, there is more than one way to refer to the physical address 0x07c00. See the section at the end of this answer on how to handle it correctly.


What happens to your bootloader?

Assuming we use VirtualBox, and the information given in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 when entering our bootloader. If we take a sample code (using ORG 0x0000 ), I wrote in the first section of this answer and look at the parsed information (I will use the objdump output), we would see this:

 objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin 00000000 <.data>: 0: 0e push cs 1: 1f pop ds 2: 68 00 b8 push 0xb800 5: 07 pop es 6: b4 0e mov ah,0xe 8: a0 24 00 mov al,ds:0x24 b: 26 a3 00 00 mov es:0x0,ax f: a0 25 00 mov al,ds:0x25 12: 26 a3 02 00 mov es:0x2,ax 16: b0 4f mov al,0x4f 18: 26 a3 04 00 mov es:0x4,ax 1c: b0 21 mov al,0x21 1e: 26 a3 06 00 mov es:0x6,ax 22: fa cli 23: f4 hlt 24: 50 push ax ; Letter 'P' 25: 41 inc cx ; Letter 'A' ... 1fe: 55 push bp 1ff: aa stos BYTE PTR es:[di],al 

Since ORG information is lost when building into a binary, I use --adjust-vma=0x0000 so that the first column of values โ€‹โ€‹(memory address) starts with 0x0000. I want to do this because I used ORG 0x0000 in the assembler source code. I also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).

If you run this program in VirtualBox, the first two characters will come out as gibberish. So why is that? First, remember that VirtualBox reached our code by setting CS at 0x0000 and IP at 0x7c00. Then this code copied CS to DS:

  0: 0e push cs 1: 1f pop ds 

Since the CS was zero, the DS is zero. Now look at this line:

  8: a0 24 00 mov al,ds:0x24 

ds:0x24 is actually the encoded address for the msg variable in our data section. A byte with an offset of 0x24 has a value of P in it (0x25 has A ). You can see where everything can go wrong. Our DS = 0x0000, so mov al,ds:0x24 really matches mov al,0x0000:0x24 . This syntax is invalid, but I replace DS 0x0000 to make a point. 0x0000:0x24 where our code at runtime will try to read our letter P from. But wait! This is the physical address (0x0000 <4) + 0x24 = 0x00024. This memory address is at the bottom of the memory in the middle of the interrupt vector table. It is clear that this is not what we planned!

There are several ways to solve this problem. The simplest (and preferred method) is to actually place the correct segment in the DS, rather than relying on what CS can be when our program is running. Since we set the ORG to 0x0000, we need to have a data segment (DS) = 0x07c0. Segment: offset pair 0x07c0: 0x0000 = physical address 0x07c00. This is the address of our bootloader. So, all we need to do is change the code, replacing:

  push cs pop ds ; DS=CS 

FROM

  push 0x07c0 pop ds ; DS=0x07c0 

This change should provide the correct output on startup in VirtualBox. Now let's see why. This code has not changed:

  8: a0 24 00 mov al,ds:0x24 

Now that DS = 0x07c0 is running. That would be like saying mov al,0x07c0:0x24 . 0x07c0:0x24 , which translates to a physical address (0x07c0 <4) + 0x24 = 0x07c24. This is what we want, since our bootloader is physically placed in the BIOS, starting from this point, and therefore it must correctly reference our msg variable.

The moral of the story? What you use for ORG should be the applicable value in the DS register when starting our program. We must install it explicitly and not rely on what is in CS.


Why print immediate values?

With the source code, the first 2 characters printed gibberish, but the last two did not. As discussed in the previous section, there was a reason the first character was not printed, but what about the last two characters?

Let's take a closer look at the 3rd character of O :

  16: b0 4f mov al,0x4f ; 0x4f = 'O' 

Since we used an instantaneous (constant) value and moved it to the AL register, the character itself is encoded as part of the instruction. It does not rely on memory access through the DS register. Because of this, the last 2 characters are displayed.


Ross Ridge's offer and why it works in VirtualBox

Ross Ridge suggested using the ORG 0x7C00 , and you noticed that it worked. Why did this happen? And is that the perfect solution?

Using my very first example and change ORG 0x0000 to ORG 0x7C00 and then assemble it. objdump would provide this disassembly:

 objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin boot.bin: file format binary Disassembly of section .data: 00007c00 <.data>: 7c00: 0e push cs 7c01: 1f pop ds 7c02: 68 00 b8 push 0xb800 7c05: 07 pop es 7c06: b4 0e mov ah,0xe 7c08: a0 24 7c mov al,ds:0x7c24 7c0b: 26 a3 00 00 mov es:0x0,ax 7c0f: a0 25 7c mov al,ds:0x7c25 7c12: 26 a3 02 00 mov es:0x2,ax 7c16: b0 4f mov al,0x4f 7c18: 26 a3 04 00 mov es:0x4,ax 7c1c: b0 21 mov al,0x21 7c1e: 26 a3 06 00 mov es:0x6,ax 7c22: fa cli 7c23: f4 hlt 7c24: 50 push ax ; Letter 'P' 7c25: 41 inc cx ; Letter 'A' ... 7dfe: 55 push bp 7dff: aa stos BYTE PTR es:[di],al 

VirtualBox sets CS to 0x0000 when it jumps to our bootloader. Our source code then copied CS to DS, so DS = 0x0000. Now let's see what the ORG 0x7C00 directive ORG 0x7C00 for our generated code:

  7c08: a0 24 7c mov al,ds:0x7c24 

Notice how we now use the offset 0x7c24! It will look like mov al,0x0000:0x7c24 , which is the physical address (0x0000 <4) + 0x7c24 = 0x07c24. This is the correct memory location in which the bootloader is loaded, and is the correct position of our msg line. So it works.

ORG 0x7C00 using an ORG 0x7C00 bad idea? Not. This is normal. But we have a subtle question that can be fought. What happens if another virtual computer environment or real hardware does not use FAR JMP for our bootloader using CS: IP 0x0000: 0x7c000? It is possible. There are many physical PCs with BIOS that are actually equivalent to down-jumping at 0x07c0:0x0000 . This is also the physical address 0x07c00 , as we have already seen. In this environment, when our code runs CS = 0x07c0. If we use source code that copies CS to DS, DS now also has 0x07c0. Now look what will happen to this code in this situation:

  7c08: a0 24 7c mov al,ds:0x7c24 

DS = 0x07c0 in this scenario. Now we have something similar to mov al,0x07c0:0x7c24 , when the program is actually executing. Uh, that looks bad. What does this mean as a physical address? (0x07c0 <4) + 0x7c24 = 0x0F824. This is somewhere above our bootloader, and it will contain everything that happens after the computer boots. Probably zeros, but this should be considered garbage. Obviously our msg line was not loaded!

So how do we resolve this? To make corrections to what Ross Ridge suggested and to heed the advice I gave earlier on explicitly installing DS in the segment we really want (don't assume that CS is correct and then blindly copy to DS), we should put 0x0000 in DS when our bootloader starts if we use ORG 0x7C00 . Therefore, we can change this code:

 ORG 0x7c00 start: push cs pop ds ; DS=CS 

in

 ORG 0x7c00 start: xor ax, ax ; ax=0x0000 mov ds, ax ; DS=0x0000 

Here we do not rely on an unreliable value in CS. We simply set the DS to a segment value, which makes sense given the ORG used. You could press 0x0000 and pull it into DS, as you did. I'm more used to resetting the register and moving it to DS.

Taking this approach, it does not matter what value in CS could be used to reach our loader, the code will still refer to the appropriate memory location for our data.


Do not assume that the 1st step is invoked by the BIOS with CS: IP = 0x0000: 0x7c00

In my general bootloader tips that I wrote in StackOverflow's previous answer, itโ€™s very important to make tip # 1:

  • When the BIOS goes to your code, you cannot rely on CS, DS, ES, SS, SP registers having real or expected values. They must be configured properly at bootloader startup. You can guarantee that your bootloader will be loaded and started from the physical address 0x07c00 and that the boot disk number will be loaded into the DL register.

The BIOS may have JMP'ed FAR (or equivalent) to our code with jmp 0x07c0:0x0000 , and some emulators and real hardware do this this way. Others use jmp 0x0000:0x7c00 , as VirtualBox does.

We must take this into account by explicitly specifying DS what we need and set it to what makes sense for the value that we use in our ORG directive.


Summary

Do not assume that CS is our expected value and do not blindly copy CS to DS. Install DS explicitly.

Your code can be fixed to use either ORG 0x0000 , as you originally used it, if we assign DS 0x07c0 accordingly, as discussed earlier. It might look like this:

 ORG 0 BITS 16 push word 0xB800 ; Address of text screen video memory in real mode for colored monitors push 0x07c0 pop ds ; DS=0x07c0 since we use ORG 0x0000 pop es 

Alternatively, we could use ORG 0x7C00 as follows:

 ORG 0x7c00 BITS 16 push word 0xB800 ; Address of text screen video memory in real mode for colored monitors push 0x0000 pop ds ; DS=0x0000 since we use ORG 0x7c00 pop es 
+8
source

Source: https://habr.com/ru/post/1013762/


All Articles