Assumption: All the commands assume that you are located in the NalogaLongMode folder!
Important: When starting qemu choose the 4th option with no additional drivers!

1. Checking feature support

For the program to work and perform it's function, that is to use more than 4G of memory, long mode is needed. Besides long mode the cpu must support 1-Gbyte physical-page translation. To check this I have written a small program, to run it please start qemu with the folowing command:

qemu-system-x86_64 -hda disk.vmdk -drive file=fat:disk:rw:"./"

And when the VM starts enter the folowing commands:

D:
checkPSS.com

The program should return two lines of text. The fist will indicate if long mode is supported: Long mode is supported or Long mode is not supported. And the second line will indecate if 1-Gbyte physical-page translation is supported:1G pages supported or 1G pages not supported.

For the program to work both features must be supported!

2. Compiling the code

The main program is compiled to a binary file using the folowing command:

nasm -fbin Main.asm -o Main.com

3. Running the program

To run the program you must first start qemu. To do that please enter the folowing command into terminal:

qemu-system-x86_64 -hda disk.vmdk -drive file=fat:disk:rw:"./" -monitor stdio -m 6G

When you get to the FreeDOS promp enter the folowing command:

D:Main.com

The program may take a while to finish its operation. When it's finished will print out a message that it is done.

4. Explanation

The point of the program is to enter Long mode (64-bit mode) where we can address more than 4GB of ram. Long mode is very similar to protected mode and the entry point is almost the same as well. To enter long mode you MUST follow the following steps:

  1. Create a Valid GDT (Global Descriptor Table)
  2. Create a 6 byte pseudo-descriptor to point to the GDT
  3. Enable paging; load CR3 with a valid page table, PML4.
  4. Enable PAE (Physical Address Extension); set CR4.PAE = 1.
  5. Set IA32_EFER.LME = 1.
  6. Disable Interrupts (CLI).
  7. Load an IDT pseudo-descriptor that has a null limit (this prevents the real mode IDT from being used in protected mode)
  8. Set the PE bit and the PG bit of the MSW or CR0 register
  9. Execute a far jump
  10. Load data segment registers with valid selector(s) to prevent GP exceptions when interrupts happen
  11. Load SS:(E)SP with a valid stack
  12. Load an IDT pseudo-descriptor that points to the IDT
  13. Enable Interrupts.

The program skips the final 3 steps as the program doesnt use interrupts or the stack.

So lets start; First the program jumps to the beginning of code and sets cs to zero. After the fist jump an empty IDT is decleared for later use and some space for a small stack.

jmp 0x0000:main
 
ALIGN 4
IDT:
    .Length       dw 0
    .Base         dd 0
    
dd 0,0,0,0
stack:

The next thing we do is clear segment registers es and ss and load the beginnig of stack into esp and address of a buffer to edi. This buffer is decleared at the end of the program in the .bss section and is later used to store Page tables.

main:

    xor eax,eax
    mov es,eax
    mov edi,page
    
    mov ss, ax
    mov eax, stack
    mov esp, eax

To create the Page tables we must first clear the memory, to do that we just need to write a constant (0) to every memory location. This is done with the folowing code:

SwitchToLongMode:
    ; Zero out the 16KiB buffer.
    ; Since we are doing a rep stosd, count should be bytes/4.   
    push di                           ; REP STOSD alters DI.
    mov ecx, 0x1000                   ; Repeat 0x1000 times 
    xor eax, eax                      ; Value to be written
    cld                               ; Clear direction flag
    rep stosd
    pop di 

Now we need to actualy fill the Page tables. Only one entry is needed in the PML4 page table as each entry can address 521GB of memory. The entry is made of the PDT and a few flags. The PDT address is at edi+0x1000 and present and write flags must also be set.

; Build the Page Map Level 4.
    ; es:di points to the Page Map Level 4 table.
    lea eax, [es:di + 0x1000]         ; Put the address of the Page Directory Pointer Table in to EAX.
    or eax, PAGE_PRESENT | PAGE_WRITE ; Or EAX with the flags - present flag, writable flag.
    mov [es:di], eax                  ; Store the value of EAX as the first PML4E.

Next we will fill the PDT table to address 5GB of memory. For this we need 5 entries. They are made of the base physical address and a few flags. The bits we must set are the present bit, write and 1GB page translation so that we need to create only two page tables one with one entry and the second one with 4 etries.

; Build the Page Directory Table
    push di                                       ; Save DI for the time being.
    lea di, [di + 0x1000]                         ; Point DI to the page table.
    mov eax, PAGE_PRESENT | PAGE_WRITE | PAGE_G   ; Move the flags into EAX - and point it to 0x0000. 
 
    mov [es:di], eax         ; Write first entry into PDPE page table, which addresses the first 1GB of memory
 
    add eax, 0x40000000      ; Increase start adress
    add di, 8
    mov [es:di], eax         ; Write second entry into PDPE page table, which addresses the second GB of memory
    
    add eax, 0x40000000      ; Increase address
    add di, 8
    mov [es:di], eax         ; Write third entry into PDPE page table, which addresses the third GB of memory
    
    add eax, 0x40000000      ; Increase address
    add di, 8
    mov [es:di], eax         ; Write fourth entry into PDPE page table, which addresses the fourth GB of memory
  
    ; To address space higher than 4G we need to write a bit to a higher position, as the eax register can only hold 32-bit values
    mov eax, PAGE_PRESENT | PAGE_WRITE | PAGE_G     ; Reset eax to start position
    mov ebx, 0x00000001                             ; Set value for setting the next highest bit
     
    mov [es:di], eax    ; Write the lower half of the fifth entry
    add di, 4
    mov [es:di], ebx    ; Write bit to the next position, fifth entry now complete. 
    
    ; Now there is 5G of memory mapped
    pop di              ; Restore DI.

The Page tables are now created and present in memory the next step is to disable interrupts.

; Disable IRQs
    mov al, 0xFF                      ; Out 0xFF to 0xA1 and 0x21 to disable all IRQs.
    out 0xA1, al
    out 0x21, al
 
    nop
    nop

Now that interrupts are dissabled we can load the empty IDT.

xor ax,ax
    mov ds,ax
 
    lidt [IDT]                        ; Load a zero length IDT so that any NMI causes a triple fault.

There isnt much left to do, so we can start entering long mode. To do this we must first enable PAE and PGE. Load PML4 address into control register 3, set the LME bit in EFER MSR, load the GDT and finaly do a long jump to set the code segment to point into the GDT code segment.

; Enter long mode.
    mov eax, 10100000b                ; Set the PAE and PGE bit.
    mov cr4, eax
 
    mov edx, edi                      ; Point CR3 at the PML4.
    mov cr3, edx
 
    mov ecx, 0xC0000080               ; Read from the EFER MSR. 
    rdmsr    
 
    or eax, 0x00000100                ; Set the LME bit.
    wrmsr
 
    mov ebx, cr0                      ; Activate long mode -
    or ebx,0x80000001                 ; - by enabling paging and protection simultaneously.
    mov cr0, ebx                    
 
    lgdt [GDT.Pointer]                ; Load GDT.Pointer defined below.
 
    jmp CODE_SEG:LongMode             ; Load CS with 64 bit segment and flush the instruction cache

For completnes here is the GDT table structure.

; Global Descriptor Table
GDT:
.Null:
    dq 0x0000000000000000             ; Null Descriptor - should be present.
 
.Code:
    dq 0x00209A0000000000             ; 64-bit code descriptor (exec/read).
    dq 0x0000920000000000             ; 64-bit data descriptor (read/write).
 
ALIGN 4
    dw 0                              ; Padding to make the "address of the GDT" field aligned on a 4-byte boundary
 
.Pointer:
    dw $ - GDT - 1                    ; 16-bit Size (Limit) of GDT.
    dd GDT 

And finaly the 64-bit code. Before we do anything we must set all of the segment register to select the GDT's data section/segment. After that we just write some value to the first 4GB of memory and print a message that we are done.

[BITS 64]      
LongMode:
    mov ax, DATA_SEG
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    
    mov rax, 0x100000         ; Move start address to 64bit register rax
    mov rcx, 0x100000000      ; End address 
write:    
    mov qword [rax], 0x2a     ;write value to memory
    add rax,0x8
    cmp rax,rcx               ;Check if we are done
    jl write
 
    ; Blank out the screen to a blue color.
    mov edi, 0xB8000
    mov rcx, 500                      ; Since we are clearing uint64_t over here, we put the count as Count/4.
    mov rax, 0x1F201F201F201F20       ; Set the value to set the screen to: Blue background, white foreground, blank spaces.
    rep stosq                         ; Clear the entire screen. 
 
    ; Display "Done"
    mov edi, 0x00b8000              
 
    mov rax, 0x1F651F6e1F6f1F44    
    mov [edi],rax
 
    hlt

5. Encountered problems

Do not use any of the FreeDOS drivers, they will just be a hasle when trying to debug your program! You can trust me with that as I have been pulling my brains out for the last few days. At least if you do use them the first thing to check if you are having problems with loading the IDT or GDT tables is to see if paging is disabled. If the program is working in the MBR it should also work in DOS. The thing that might be going wrong is that you are writing to an address that you shouldn't be!

5. Resources

http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf
https://en.wikibooks.org/wiki/X86_Assembly/Protected_Mode
http://wiki.osdev.org/Entering_Long_Mode_Directly
http://wiki.osdev.org/Setting_Up_Long_Mode
http://os.phil-opp.com/entering-longmode.html
http://wiki.osdev.org/GDT_Tutorial
http://wiki.osdev.org/Interrupt_Descriptor_Table