date: 2019-02-12
tags: OS 6.828
布置好环境之后就可以开始一点一点写作业了。
了解汇编代码。
阅读Brennan's Guide to Inline Assembly的The Syntax部分,该书是使用的AT&T syntax和本课使用的GNU assembler一致。
需要注意的是,AT&T是左边移到右边,Intel是右边移到左边
movl %eax, %ebx # AT&T
mov ebx, eax # Intel
注意上面两句的意思都是load ebx with the value in eax。
另外注意一下两个的区别:
movl $0x4, %eax # tmp = 0x4, 把eax直接赋值为0x4
movl $-147, (%eax) # *p = -147,把eax值作为地址,这个地址对应的位置的值赋值为-147
有关jmp,有如下几种:
jmp 7d51 # relative jump, 只把IP赋值,在同一个segment里面
ljmp $0x10,$0x100000 # 给CS:IP赋值,不在同一个segment里面,是absolute的
jump *0x10018 # absolute, jump, 跳到绝对地址
注意在real mode的时候接受的才是16位,protected mode就会直接接受32位了,也就没有什么relative, absolute了。
编译JOS并测试qemu。这部分在配置环境的时候就已经进行了。
注意,虽然现在使用的是qemu虚拟机,但是仍然和直接跑在硬盘上是一样的。
Although simple, it's important to note that this kernel monitor is running "directly" on the "raw (virtual) hardware" of the simulated PC. This means that you should be able to copy the contents of
obj/kern/kernel.img
onto the first few sectors of a real hard disk, insert that hard disk into a real PC, turn it on, and see exactly the same thing on the PC's real screen as you did above in the QEMU window. (We don't recommend you do this on a real machine with useful information on its hard disk, though, because copyingkernel.img
onto the beginning of its hard disk will trash the master boot record and the beginning of the first partition, effectively causing everything previously on the hard disk to be lost!)
+------------------+ <- 0xFFFFFFFF (4GB)
| 32-bit |
| memory mapped |
| devices |
| |
/\/\/\/\/\/\/\/\/\/\
/\/\/\/\/\/\/\/\/\/\
| |
| Unused |
| |
+------------------+ <- depends on amount of RAM
| |
| |
| Extended Memory |
| |
| |
+------------------+ <- 0x00100000 (1MB)
| BIOS ROM |
+------------------+ <- 0x000F0000 (960KB)
| 16-bit devices, |
| expansion ROMs |
+------------------+ <- 0x000C0000 (768KB)
| VGA Display |
+------------------+ <- 0x000A0000 (640KB)
| |
| Low Memory |
| |
+------------------+ <- 0x00000000
上图是一个32位系统的内存布局。最下面的1M是最原始16位Intel 8088 processor所使用的。当时的random access memory(RAM)仅仅有640KB。
从0x000A0000
到0x000FFFFF
的384KB是留给硬件使用的。
从0x000F0000
到0x000FFFFF
的64KB非常重要,是Basic Input/Output System(BIOS)。
The BIOS is responsible for performing basic system initialization such as activating the video card and checking the amount of memory installed
为了backward compatibility,在32位机器仍然保留了最原始1MB的布局。
Recent x86 processors can support more than 4GB of physical RAM, so RAM can extend further above 0xFFFFFFFF. In this case the BIOS must arrange to leave a second hole in the system's RAM at the top of the 32-bit addressable region, to leave room for these 32-bit devices to be mapped.
Because of design limitations JOS will use only the first 256MB of a PC's physical memory anyway, so for now we will pretend that all PCs have "only" a 32-bit physical address space.
在启动OS的时候,最先会load BIOS。
使用qemu结合gdb开始调试JOS的kernel。可以看出运行的第一行是从存在BIOS部分内存的指令开始的。指令用CS:IP,这里cs是code segment pointer,ip是instruction pointer,他们一起成为了一个20bit的地址指针,其计算方式是
physical address = 16 * segment + offset.
用si进行多步运行。其结果为
# 跳到 [f000:e05b]
[f000:fff0] 0xffff0: ljmp $0xf000,$0xe05b
+ symbol-file obj/kern/kernel
(gdb) si # 比较0与%cs:0x6ac8
[f000:e05b] 0xfe05b: cmpl $0x0,%cs:0x6ac8
(gdb) si # 如果不相等,EIP(ip)赋值为0xfd2e1,即跳到0xfd2e1
[f000:e062] 0xfe062: jne 0xfd2e1
(gdb) si # 说明上面是相等的,清空%dx,
# dx是data register,用于输入输出
[f000:e066] 0xfe066: xor %dx,%dx
(gdb) si # 将ss也清零,
# ss是stack segment, 包括数据和procedure的返回地址
[f000:e068] 0xfe068: mov %dx,%ss
(gdb) si # 将esp赋为$0x7000,
# esp是stack pointer,包含了stack的offset value
# ss:sp refers to be current position of data or address
# within the program starck
[f000:e06a] 0xfe06a: mov $0x7000,%esp
(gdb) si # 把edx赋为$0xf34c2,edx是dx的32-bit版本
[f000:e070] 0xfe070: mov $0xf34c2,%edx
(gdb) si # 跳到0xfd15c
[f000:e076] 0xfe076: jmp 0xfd15c
(gdb) si # 把eax付给ecx,
# eax, primary accumulator,是用于most arithmetic instructions
# ecx, count register,存储循环信息
[f000:d15c] 0xfd15c: mov %eax,%ecx
(gdb) si # clear Interrupt Flag
# IF, determines whether the external interrupts
# like keyboard entry, etc., are to be ignored or processed
[f000:d15f] 0xfd15f: cli
(gdb) si # clear Direction Flag
# determines left or right direction
# for moving or comparing string data.
[f000:d160] 0xfd160: cld
(gdb) si # eax 赋为$0x8f
[f000:d161] 0xfd161: mov $0x8f,%eax
(gdb) si # 从al输出到$0x70 port, al是ax的lower 8-bit
[f000:d167] 0xfd167: out %al,$0x70
(gdb) si # 从$0x71 port输入到al, al是ax的lower 8-bit
[f000:d169] 0xfd169: in $0x71,%al
...
对于register的名字对应的功能,可以看这里。看了这么多行也没明白BIOS是要做什么。。。
When the BIOS runs, it sets up an interrupt descriptor table and initializes various devices such as the VGA display. This is where the "
Starting SeaBIOS
" message you see in the QEMU window comes from.
软盘(floppy)和硬盘(hard disk)均被分成了512B的区域,被称为sectors,sector是disk的最小单元,一次独写操作必须要使用一个或多个sector。如果disk bootable,那么其第一个sector被称为boot sector,因为boot coder code存于其中。当BIOS发现了一个bootable disk,就会读512B到内存中,其地址为0x7c00到0x7dff,之后用一个jmp
指令将CS:IP
设为0000:7c00
,从而开始boot loading。不同于BIOS load address (0xffff0
),这个地址是相对可变的,不过现在已经标准化了。
Boot一个CD-ROM有一些别的变化,详情可以阅读 "El Torito" Bootable CD-ROM Format Specification。不过本课中,还是会把CD-ROM当成是一般的disk。
Boot loader的代码主要存于,boot/boot.s
, boot/main.c
,注意看其中的注释。实际运行中是先运行boot.s
# boot/boot.s
#include <inc/mmu.h>
# Start the CPU: switch to 32-bit protected mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with %cs=0 %ip=7c00.
.set PROT_MODE_CSEG, 0x8 # kernel code segment selector
.set PROT_MODE_DSEG, 0x10 # kernel data segment selector
.set CR0_PE_ON, 0x1 # protected mode enable flag
.globl start
start:
.code16 # Assemble for 16-bit mode
cli # Disable interrupts
cld # String operations increment
# Set up the important data segment registers (DS, ES, SS).
xorw %ax,%ax # Segment number zero
movw %ax,%ds # -> Data Segment
movw %ax,%es # -> Extra Segment
movw %ax,%ss # -> Stack Segment
# Enable A20:
# For backwards compatibility with the earliest PCs, physical
# address line 20 is tied low, so that addresses higher than
# 1MB wrap around to zero by default. This code undoes this.
seta20.1:
inb $0x64,%al # Wait for not busy
testb $0x2,%al # perform a bitwise AND and put in a flag
jnz seta20.1 # jump if not zero
movb $0xd1,%al # 0xd1 -> port 0x64
outb %al,$0x64
seta20.2:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.2
movb $0xdf,%al # 0xdf -> port 0x60
outb %al,$0x60
# Switch from real to protected mode, using a bootstrap GDT
# and segment translation that makes virtual addresses
# identical to their physical addresses, so that the
# effective memory map does not change during the switch.
lgdt gdtdesc
movl %cr0, %eax
orl $CR0_PE_ON, %eax
movl %eax, %cr0
# Jump to next instruction, but in 32-bit code segment.
# Switches processor into 32-bit mode.
ljmp $PROT_MODE_CSEG, $protcseg
.code32 # Assemble for 32-bit mode
protcseg:
# Set up the protected-mode data segment registers
movw $PROT_MODE_DSEG, %ax # Our data segment selector
movw %ax, %ds # -> DS: Data Segment
movw %ax, %es # -> ES: Extra Segment
movw %ax, %fs # -> FS
movw %ax, %gs # -> GS
movw %ax, %ss # -> SS: Stack Segment
# Set up the stack pointer and call into C.
movl $start, %esp
call bootmain
# If bootmain returns (it shouldn't), loop.
spin:
jmp spin
# Bootstrap GDT
.p2align 2 # force 4 byte alignment
gdt:
SEG_NULL # null seg
SEG(STA_X|STA_R, 0x0, 0xffffffff) # code seg
SEG(STA_W, 0x0, 0xffffffff) # data seg
gdtdesc:
.word 0x17 # sizeof(gdt) - 1
.long gdt # address gdt
// boot/main.c
#include <inc/x86.h>
#include <inc/elf.h>
/**********************************************************************
* This a dirt simple boot loader, whose sole job is to boot
* an ELF kernel image from the first IDE hard disk.
*
* DISK LAYOUT
* * This program(boot.S and main.c) is the bootloader. It should
* be stored in the first sector of the disk.
*
* * The 2nd sector onward holds the kernel image.
*
* * The kernel image must be in ELF format.
*
* BOOT UP STEPS
* * when the CPU boots it loads the BIOS into memory and executes it
*
* * the BIOS intializes devices, sets of the interrupt routines, and
* reads the first sector of the boot device(e.g., hard-drive)
* into memory and jumps to it.
*
* * Assuming this boot loader is stored in the first sector of the
* hard-drive, this code takes over...
*
* * control starts in boot.S -- which sets up protected mode,
* and a stack so C code then run, then calls bootmain()
*
* * bootmain() in this file takes over, reads in the kernel and jumps to it.
**********************************************************************/
#define SECTSIZE 512
#define ELFHDR ((struct Elf *) 0x10000) // scratch space
void readsect(void*, uint32_t);
void readseg(uint32_t, uint32_t, uint32_t);
void
bootmain(void)
{
struct Proghdr *ph, *eph;
// read 1st page off disk
readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
// is this a valid ELF?
if (ELFHDR->e_magic != ELF_MAGIC)
goto bad;
// load each program segment (ignores ph flags)
ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
eph = ph + ELFHDR->e_phnum;
for (; ph < eph; ph++)
// p_pa is the load address of this segment (as well
// as the physical address)
readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
// call the entry point from the ELF header
// note: does not return!
((void (*)(void)) (ELFHDR->e_entry))();
bad:
outw(0x8A00, 0x8A00);
outw(0x8A00, 0x8E00);
while (1)
/* do nothing */;
}
// Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
// Might copy more than asked
void
readseg(uint32_t pa, uint32_t count, uint32_t offset)
{
uint32_t end_pa;
end_pa = pa + count;
// round down to sector boundary
pa &= ~(SECTSIZE - 1);
// translate from bytes to sectors, and kernel starts at sector 1
offset = (offset / SECTSIZE) + 1;
// If this is too slow, we could read lots of sectors at a time.
// We'd write more to memory than asked, but it doesn't matter --
// we load in increasing order.
while (pa < end_pa) {
// Since we haven't enabled paging yet and we're using
// an identity segment mapping (see boot.S), we can
// use physical addresses directly. This won't be the
// case once JOS enables the MMU.
readsect((uint8_t*) pa, offset);
pa += SECTSIZE;
offset++;
}
}
void
waitdisk(void)
{
// wait for disk reaady
while ((inb(0x1F7) & 0xC0) != 0x40)
/* do nothing */;
}
void
readsect(void *dst, uint32_t offset)
{
// wait for disk to be ready
waitdisk();
outb(0x1F2, 1); // count = 1
outb(0x1F3, offset);
outb(0x1F4, offset >> 8);
outb(0x1F5, offset >> 16);
outb(0x1F6, (offset >> 24) | 0xE0);
outb(0x1F7, 0x20); // cmd 0x20 - read sectors
// wait for disk to be ready
waitdisk();
// read a sector
insl(0x1F0, dst, SECTSIZE/4);
}
上面的注释比较好的解释了这两段代码的意思。boot loader主要进行两件事,
上面的两个文件组成的boot loader的disassembly在obj/boot/boot.asm
。
Ctrl-c
: Halt the machine and break in to GDB at the current instruction. If QEMU has multiple virtual CPUs, this halts all of them.
c
(or continue
): Continue execution until the next breakpoint or Ctrl-c
.
si
(or stepi
): Execute one machine instruction.
b function
or b file:line
(or breakpoint
): Set a breakpoint at the given function or line.
b *addr
(or breakpoint
): Set a breakpoint at the EIP addr.
set print pretty
: Enable pretty-printing of arrays and structs.
info registers
: Print the general purpose registers, eip
, eflags
, and the segment selectors. For a much more thorough dump of the machine register state, see QEMU's own info registers
command.
x/Nx addr
: Display a hex dump of N words starting at virtual address addr. If N is omitted, it defaults to 1. addr can be any expression. (注意 word 的大小并不统一. In GNU assembly, a word is two bytes)
x/Ni addr
: Display the N assembly instructions starting at addr. Using $eip
as addr will display the instructions at the current instruction pointer.
symbol-file file
: (Lab 3+) Switch to symbol file file. When GDB attaches to QEMU, it has no notion of the process boundaries within the virtual machine, so we have to tell it which symbols to use. By default, we configure GDB to use the kernel symbol file, obj/kern/kernel
. If the machine is running user code, say hello.c
, you can switch to the hello symbol file using symbol-file obj/user/hello
.
QEMU represents each virtual CPU as a thread in GDB, so you can use all of GDB's thread-related commands to view or manipulate QEMU's virtual CPUs.
thread n
: GDB focuses on one thread (i.e., CPU) at a time. This command switches that focus to thread n, numbered from zero.
info threads
: List all threads (i.e., CPUs), including their state (active or halted) and what function they're in.
If we set the breakpoint at 0x7c00, 也就是boot loader的入口,逐步执行就能看到运行boot.s
文件了。
对问题的解答:
(gdb) si
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x7c32
0x00007c2d in ?? ()
(gdb) si
The target architecture is assumed to be i386
=> 0x7c32: mov $0x10,%ax
0x00007c32 in ?? ()
对应于boot.s
中的
ljmp $PROT_MODE_CSEG, $protcseg
用一个长跳,重新给CS和IP赋值,转化为32位。(从这里的CS和IP的变量名可以看出来这里就是为了变成protect mode用的)
从boot.asm
可以看出,main.c
里面调用((void (*)(void)) (ELFHDR->e_entry))();
在0x7d6b
,可以直接跳过去
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x7d6b: call *0x10018
Breakpoint 2, 0x00007d6b in ?? ()
(gdb) si
=> 0x10000c: movw $0x1234,0x472
0x0010000c in ?? ()
最后一条是call *0x10018
,注意这里带*
的指absolute call,会直接跳到这个地址,和不带*
的给IP或者IP和CS(long jump)不同。所以kernel的第一条指令是movw $0x1234,0x472
。
从上面显示是0x10000c
,这个应该是IP的地址,所以应该和上面的call *0x10018
不矛盾?
从main.c
中的第一个segment,也就是ELFHDR
中有变量e_phnum
,其为前4096byte。(和ELF有关,下文会讲)。
首先来回顾一下C的pointer。pointers.c
的代码如下:
#include <stdio.h>
#include <stdlib.h>
void
f(void)
{
int a[4];
int *b = malloc(16);
int *c;
int i;
// 输出a, b, c的地址
printf("1: a = %p, b = %p, c = %p\n", a, b, c);
c = a; // c和a[0]一个地址
for (i = 0; i < 4; i++)
a[i] = 100 + i;
c[0] = 200; // c和a[0]都改为200
printf("2: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
a[0], a[1], a[2], a[3]);
c[1] = 300; // a[1]=300
*(c + 2) = 301; // a[2]=301
3[c] = 302; // 这个第一次见,a[3]=302
printf("3: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
a[0], a[1], a[2], a[3]);
c = c + 1; // c指向a[1]
*c = 400; // a[1]=400
printf("4: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
a[0], a[1], a[2], a[3]);
c = (int *) ((char *) c + 1); // c指向a[1]向后1byte
*c = 500; // c会污染a[1], a[2]
printf("5: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
a[0], a[1], a[2], a[3]);
b = (int *) a + 1; // b指向a[1], a[0]向后4byte
c = (int *) ((char *) a + 1); // c指向a[0]向后1byte的地方
printf("6: a = %p, b = %p, c = %p\n", a, b, c);
}
int
main(int ac, char **av)
{
f();
return 0;
}
唯一需要注意的是指针间的类型转换,可以参见这里。pointers.c
的输出如下,可以对照一下。
$ ./pointers
1: a = 0x7fff93fab1e0, b = 0x55f7fc10c260, c = 0xf0b5ff
2: a[0] = 200, a[1] = 101, a[2] = 102, a[3] = 103
3: a[0] = 200, a[1] = 300, a[2] = 301, a[3] = 302
4: a[0] = 200, a[1] = 400, a[2] = 301, a[3] = 302
5: a[0] = 200, a[1] = 128144, a[2] = 256, a[3] = 302
6: a = 0x7fff93fab1e0, b = 0x7fff93fab1e4, c = 0x7fff93fab1e1
为了理解boot/main.c
,我们需要知道ELF是什么。ELF,全称Executable and Linkable Format,就是二进制编码的汇编指令。对于本课来说,需要知道ELF就是header with loading information加several program sections,每一个section都是需要被加载进内存的指定位置的code chunk。
ELF header为固定长度,之后跟着一个可变长的program header。program header记录了program sections的信息。intc/elf.h
表示了ELF header的C definition:
#ifndef JOS_INC_ELF_H
#define JOS_INC_ELF_H
#define ELF_MAGIC 0x464C457FU /* "\x7FELF" in little endian */
struct Elf {
uint32_t e_magic; // must equal ELF_MAGIC
uint8_t e_elf[12];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
uint32_t e_entry;
uint32_t e_phoff;
uint32_t e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
};
struct Proghdr {
uint32_t p_type;
uint32_t p_offset;
uint32_t p_va;
uint32_t p_pa;
uint32_t p_filesz;
uint32_t p_memsz;
uint32_t p_flags;
uint32_t p_align;
};
struct Secthdr {
uint32_t sh_name;
uint32_t sh_type;
uint32_t sh_flags;
uint32_t sh_addr;
uint32_t sh_offset;
uint32_t sh_size;
uint32_t sh_link;
uint32_t sh_info;
uint32_t sh_addralign;
uint32_t sh_entsize;
};
// Values for Proghdr::p_type
#define ELF_PROG_LOAD 1
// Flag bits for Proghdr::p_flags
#define ELF_PROG_FLAG_EXEC 1
#define ELF_PROG_FLAG_WRITE 2
#define ELF_PROG_FLAG_READ 4
// Values for Secthdr::sh_type
#define ELF_SHT_NULL 0
#define ELF_SHT_PROGBITS 1
#define ELF_SHT_SYMTAB 2
#define ELF_SHT_STRTAB 3
// Values for Secthdr::sh_name
#define ELF_SHN_UNDEF 0
#endif /* !JOS_INC_ELF_H */
对于program section,我们关心如下内容:
.text
: The program's executable instructions..rodata
: 只读数据, 如C编译器生成的ASCII string constants. (尽管一般我们都not bother设置硬件为只读.).data
: 保存程序的初始数据, 如被初始化了的全局变量 int x = 5;
当Linker计算程序的memory layout的时候,他会在.data
后的.bss
部分给未初始化的全局变量留下空间。由于C对未初始化的全局变量的值有定义,所以也不需要在.bss
里存数据,Linker只会指向.bss
里的空间,把数据设置为0必须由loader或program自己来做。
检查kernel的ELF header:
$ objdump -h obj/kern/kernel
obj/kern/kernel: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000019e9 f0100000 00100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 000006c0 f0101a00 00101a00 00002a00 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .stab 00003b95 f01020c0 001020c0 000030c0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .stabstr 00001948 f0105c55 00105c55 00006c55 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .data 00009300 f0108000 00108000 00009000 2**12
CONTENTS, ALLOC, LOAD, DATA
5 .got 00000008 f0111300 00111300 00012300 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .got.plt 0000000c f0111308 00111308 00012308 2**2
CONTENTS, ALLOC, LOAD, DATA
7 .data.rel.local 00001000 f0112000 00112000 00013000 2**12
CONTENTS, ALLOC, LOAD, DATA
8 .data.rel.ro.local 00000044 f0113000 00113000 00014000 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .bss 00000648 f0113060 00113060 00014060 2**5
CONTENTS, ALLOC, LOAD, DATA
10 .comment 0000002a 00000000 00000000 000146a8 2**0
CONTENTS, READONLY
注意.text中的VMA (link address), LMA(load address)。
LMA(load address)是该section该被load进内存的哪里。VMA(link address)则是这个section应该从哪里开始执行。大多时候,这两个东西是相同的,如
$ objdump -h obj/boot/boot.out
obj/boot/boot.out: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000186 00007c00 00007c00 00000074 2**2
CONTENTS, ALLOC, LOAD, CODE
1 .eh_frame 000000a8 00007d88 00007d88 000001fc 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .stab 0000087c 00000000 00000000 000002a4 2**2
CONTENTS, READONLY, DEBUGGING
3 .stabstr 00000925 00000000 00000000 00000b20 2**0
CONTENTS, READONLY, DEBUGGING
4 .comment 0000002a 00000000 00000000 00001445 2**0
CONTENTS, READONLY
boot loader用ELF program header来决定如何加载某个section。kernel的ELF program header可以通过如下防止查看:
$ objdump -x obj/kern/kernel
obj/kern/kernel: file format elf32-i386
obj/kern/kernel
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0010000c
Program Header:
LOAD off 0x00001000 vaddr 0xf0100000 paddr 0x00100000 align 2**12
filesz 0x0000759d memsz 0x0000759d flags r-x
LOAD off 0x00009000 vaddr 0xf0108000 paddr 0x00108000 align 2**12
filesz 0x0000b6a8 memsz 0x0000b6a8 flags rw-
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rwx
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000019e9 f0100000 00100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 000006c0 f0101a00 00101a00 00002a00 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .stab 00003b95 f01020c0 001020c0 000030c0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .stabstr 00001948 f0105c55 00105c55 00006c55 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .data 00009300 f0108000 00108000 00009000 2**12
CONTENTS, ALLOC, LOAD, DATA
5 .got 00000008 f0111300 00111300 00012300 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .got.plt 0000000c f0111308 00111308 00012308 2**2
CONTENTS, ALLOC, LOAD, DATA
7 .data.rel.local 00001000 f0112000 00112000 00013000 2**12
CONTENTS, ALLOC, LOAD, DATA
8 .data.rel.ro.local 00000044 f0113000 00113000 00014000 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .bss 00000648 f0113060 00113060 00014060 2**5
CONTENTS, ALLOC, LOAD, DATA
10 .comment 0000002a 00000000 00000000 000146a8 2**0
CONTENTS, READONLY
SYMBOL TABLE:
f0100000 l d .text 00000000 .text
f0101a00 l d .rodata 00000000 .rodata
f01020c0 l d .stab 00000000 .stab
f0105c55 l d .stabstr 00000000 .stabstr
f0108000 l d .data 00000000 .data
f0111300 l d .got 00000000 .got
f0111308 l d .got.plt 00000000 .got.plt
f0112000 l d .data.rel.local 00000000 .data.rel.local
f0113000 l d .data.rel.ro.local 00000000 .data.rel.ro.local
f0113060 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l df *ABS* 00000000 obj/kern/entry.o
f010002f l .text 00000000 relocated
f010003e l .text 00000000 spin
00000000 l df *ABS* 00000000 entrypgdir.c
00000000 l df *ABS* 00000000 init.c
00000000 l df *ABS* 00000000 console.c
f01001c0 l F .text 0000001f serial_proc_data
f01001df l F .text 0000004b cons_intr
f0113080 l O .bss 00000208 cons
f010022a l F .text 00000132 kbd_proc_data
f0113060 l O .bss 00000004 shift.1338
f0101bc0 l O .rodata 00000100 shiftcode
...
注意上面的Program Headers部分,需要被加载进内存的是那些标记着LOAD的(后面的off我猜测应该是offset)。其他的一些信息有the virtual address ("vaddr"), the physical address ("paddr"), and the size of the loaded area ("memsz" and "filesz").
回到boot/main.c
,ph->p_pa
保存了the segment's destination physical address(对于这个具体的例子,就是真实的物理地址)。
BIOS从0x7c00
加载loader,这也就是loader的load address(LMA),这也是boot sector的执行地址,所以也是link address(VMA)。我们通过在boot/Magefrag
里面设置-Ttext 0x7c00
来让linker可以在生成的代码中给出正确的内存地址。
如果改变boot/Makefrag
里面的地址,如改成0x7c04
,重新make,会直接报错:
$ make qemu-nox
***
*** Use Ctrl-a x to exit qemu
***
qemu-system-i386 -nographic -drive file=obj/kern/kernel.img,index=0,media=disk,format=raw -serial mon:stdio -gdb tcp::26000 -D qemu.log
EAX=00000011 EBX=00000000 ECX=00000000 EDX=00000080
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006f20
EIP=00007c2d EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
CS =0000 00000000 0000ffff 00009b00 DPL=0 CS16 [-RA]
SS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
DS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
FS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
GS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 00f7ba55 00000000
IDT= 00000000 000003ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Triple fault. Halting for inspection via QEMU monitor.
改成0xf7c04
会报不同的错:
$ make qemu-nox
+ ld boot/boot
obj/boot/boot.o:boot/boot.S:48:(.text+0x21): relocation truncated to fit: R_386_16 against `.text'
obj/boot/boot.o:boot/boot.S:55:(.text+0x2e): relocation truncated to fit: R_386_16 against `.text'
boot/Makefrag:27: recipe for target 'obj/boot/boot' failed
make: *** [obj/boot/boot] Error 1
这里可能是因为直接指向BIOS去了?
我们回头来看kernel。和boot loader不同,kernel的LMA和VMA不同,所以kernel是希望从low address开始加载,但是希望在high address执行。我们会在下一部分来研究这个问题。
除了section information,还有一个e_entry
很重要,其表示了the link address of the entry point in the program: the memory address in the program's text section at which the program should begin executing. You can see the entry point(这里没明白,是这个section的还是下个section的)
$ objdump -f obj/kern/kernel
obj/kern/kernel: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0010000c
这里的start address就是e_entry
。
在BIOS刚刚进入boot loader时:
Breakpoint 1, 0x00007c00 in ?? ()
(gdb) x/8x 0x00100000
0x100000: 0x00000000 0x00000000 0x00000000 0x00000000
0x100010: 0x00000000 0x00000000 0x00000000 0x00000000
刚进kernel时:
0x0010000c in ?? ()
(gdb) x/8x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x2000b812 0x220f0011 0xc0200fd8
不同在于进入kernel之前已经把kernel的内容load进内存了。
前文提到过,bootloader的LMA和VMA是一样的,但是kernel的link address和load address非常不一样,其显示在kern/kernel.ld
的最上面。
...
SECTIONS
{
/* Link the kernel at this address: "." means the current address */
. = 0xF0100000;
/* AT(...) gives the load address of this section, which tells
the boot loader where to load the kernel in physical memory */
.text : AT(0x100000) {
*(.text .stub .text.* .gnu.linkonce.t.*)
}
...
OS kernel经常会在非常高的虚拟内存部分运行比如0xf0100000
,以把较低的部分留给用户程序。很多机器并没有到0xf0100000
这么高的物理内存,所以我们会使用处理器的内存管理硬件来把虚拟内存映射到物理内存,而这个映射的物理内存是0x00100000
(这也就是LMA,或者说是bootloader把kernel加载在的物理内存的起始地址)。所以即使link address(VMA)的虚拟内存足够高,但是仍然会被加载在1MB的地方,BIOS的正上面。
事实上,下一个lab我们会把整个256MB的内存,从物理地址0x00000000
到0x0fffffff
映射到0xf0000000
到0xffffffff
。现在来说,我们仅仅映射了最初的4MB,而这个映射是在kern/entrypgdir.c
硬编码的。当kern/entry.S
设置CR0_PG
flag的时候,memory inference就是物理地址(严格的讲,是线性地址,但是boot/boot.S
设置了一个iddentity mapping,且我们将不会做出修改)。当CR0_PG
被设置了之后,memory reference就将是虚拟内存了。entry_pgdir
把虚拟内存从 0xf0000000 到 0xf0400000 翻译成物理内存的 0x00000000 到 0x00400000, 同时把 0x00000000 到 0x00400000 也翻译成 0x00000000 到 0x00400000。因为我们还没有设置interrupt handling,任何不是这两个区间中的虚拟内存都会触发hardware exception。
在还没有运行movl %eax, %cr0
之前:
(gdb) x/1x 0x00100000
0x100000: 0x1badb002
(gdb) x/1x 0xf0100000
0xf0100000 <_start+4026531828>: 0x00000000
运行之后:
(gdb) x/1x 0x00100000
0x100000: 0x1badb002
(gdb) x/1x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002
这就表明了运行过之后,CPU进行了内存的重映射。
阅读kern/printf.c
, lib/printfmt.c
和kern/console.c
。
加入格式输出8进制数字。主要需要阅读的代码是lib/printfmt.c
中的vprintfmt
。只需要模仿上面的10进制就好了。
// 从printf.c中,我们可以看出:
// putch就是putchar,putdat是记录输出了多少符号,fmt是格式,ap就是格式对应的参数
void
vprintfmt(void (*putch)(int, void*), void *putdat, const char *fmt, va_list ap) {
register const char *p; // 用register关键词提示compiler可以把其放入register
register int ch, err;
unsigned long long num;
int base, lflag, width, precision, altflag;
char padc; // pad char
while (1) {
while ((ch = *(unsigned char *) fmt++) != '%') { // 如果和%没关系,直接输出
if (ch == '\0')
return;
putch(ch, putdat);
}
// Process a %-escape sequence 注意每次都会重新初始化
padc = ' ';
width = -1;
precision = -1;
lflag = 0;
altflag = 0;
reswitch:
switch (ch = *(unsigned char *) fmt++) {
...
// unsigned decimal
case 'u':
num = getuint(&ap, lflag);
base = 10;
goto number;
// (unsigned) octal
case 'o':
// Replace this with your code.
num = getuint(&ap, lflag);
base = 8;
goto number;
...
// unrecognized escape sequence - just print it literally
default:
putch('%', putdat);
for (fmt--; fmt[-1] != '%'; fmt--) // 退回去...
/* do nothing */;
break;
}
}
}
printf.c
and console.c
. Specifically, what function doesconsole.c
export? How is this function used by printf.c
?// printf.c
#include <inc/types.h>
#include <inc/stdio.h>
#include <inc/stdarg.h>
static void
putch(int ch, int *cnt) {
cputchar(ch);
*cnt++;
}
int
vcprintf(const char *fmt, va_list ap) {
int cnt = 0;
vprintfmt((void*)putch, &cnt, fmt, ap);
return cnt;
}
int
cprintf(const char *fmt, ...) { // 这里利用了variable length argument实现了可变参数
va_list ap;
int cnt;
va_start(ap, fmt);
cnt = vcprintf(fmt, ap);
va_end(ap);
return cnt;
}
printf.c
调用了cputchar
console.c
中的如下代码// console.h
#define CRT_ROWS 25
#define CRT_COLS 80
#define CRT_SIZE (CRT_ROWS * CRT_COLS)
// console.c
if (crt_pos >= CRT_SIZE) { // 如果当前输出位置大于了一整篇的大小,那么就把最上面一行去掉
int i;
memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
crt_buf[i] = 0x0700 | ' ';
crt_pos -= CRT_COLS;
}
int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
这部分不知道该怎么调用。。。
运行如下代码,并解释输出
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
同样是不知道该怎么运行。。。但是可以直接推算内容,%x
是hexodecimal,57616对应为:0xe110
,然后后面的string,会按照存储顺序,注意是little endian,也就是0x72, 0x6c, 0x64输出,也就是r, l, d。所以输出就是He110 World
。注意这里用&i
是因为%s
需要输入是指针。
解释如下代码输出:
cprintf("x=%d y=%d", 3);
会输出一个随机数,因为访问到了var_list
里面没有使用的空间。
变长参数是来自GCC的stdarg.h
, 因为JOS中的stdarg
看不懂...,所以这里用了Minix里的古老的代码,对这份代码再进行简化,就只有如下几行
typedef char *va_list;
/* Amount of space required in an argument list for an arg of type TYPE.
* TYPE may alternatively be an expression whose type is used.
*/
// 这里进行了上取整,得到整数倍int size, 应该是因为gcc会进行align吧
#define __va_rounded_size(TYPE) \
(((sizeof (TYPE) + sizeof (int) - 1) / sizeof (int)) * sizeof (int))
#define va_start(AP, LASTARG) \ // LASTARG的地址加上LASTARG的大小就是var_list的地址
(AP = ((char *) &(LASTARG) + __va_rounded_size (LASTARG)))
void va_end (va_list); /* Defined in gnulib */
#define va_arg(AP, TYPE) \ // 每次往后取一个
(AP += __va_rounded_size (TYPE), \
*((TYPE *) (AP - __va_rounded_size (TYPE))))
所以如果改成反方向的话,应该就要从最后一个参数往前加就好了,只需要更改va_start
与va_arg
里面的加号变减号,减号变加号就好。
带颜色的print。我没什么兴趣,应该是要修改printfmt
里面的%c
。
这部分会详尽讨论C语言的栈。
kernel的栈是在哪里初始化的?以及栈的地址是什么?kernel是如何给栈保留空间的?
kernel的初始化用的是如下的命令:
# Set the stack pointer
movl $(bootstacktop),%esp
其把%esp
初始化为:
esp 0xf0110000 0xf0110000 <entry_pgtable>
而如何保留空间的在于kern/entry.S
里的:
.data
###################################################################
# boot stack
###################################################################
.p2align PGSHIFT # force page alignment
.globl bootstack
bootstack:
.space KSTKSIZE
.globl bootstacktop
bootstacktop:
这里.data
是会被自动初始化的数据,栈的大小就是KSTKSIZE
了。stack指针指向bootstacktop
也就是.space
后面,或者说是地址更高的地方。
在test_backtrace
函数处加断点,每次被调用的时候都发生了什么?test_backtrace
一次会push多少内容到栈中?他们都是什么?
这个函数的代码位于kern/init.c
:
void
test_backtrace(int x) {
cprintf("entering test_backtrace %d\n", x);
if (x > 0)
test_backtrace(x-1);
else
mon_backtrace(0, 0, 0);
cprintf("leaving test_backtrace %d\n", x);
}
void
i386_init(void) {
...
// Test the stack backtrace function (lab 1 only)
test_backtrace(5);
...
}
所以不加断点也可以知道,每次输出了一个字符串。
写一个mon_traceback
函数,
注意在C里面,指针+1实际上是+4。
主要利用的就是%ebp
的上一个就是%eip
,再往上就是参数,所以代码的结构是:
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
cprintf("Stack backtrace:\n");
uint32_t ebp = read_ebp();
uint32_t eip;
uint32_t *args;
struct Eipdebuginfo info;
while(ebp) {
cprintf(" ebp %08x", ebp);
eip = *((uint32_t *)ebp + 1);
cprintf(" eip %08x", eip);
args = (uint32_t *)ebp + 2;
cprintf(" args");
for(int i=0; i<5; i++) {
cprintf(" %08x", args[i]);
}
cprintf("\n");
ebp = *((uint32_t *)ebp);
int flag = debuginfo_eip(eip, &info);
if (flag) {
cprintf("Error getting eip info!");
}
else {
cprintf(" %s:%d: ", info.eip_file, info.eip_line);
cprintf("%.*s", info.eip_fn_namelen, info.eip_fn_name);
cprintf("+%d\n", eip - info.eip_fn_addr);
}
}
return 0;
}
后面关于info
的部分利用了kern/kdebug.c
中的debuginfo_eip
函数,来提取文件名等内容,在这之中,需要加一下如何搜索到ep_line
:
// Search within [lline, rline] for the line number stab.
// If found, set info->eip_line to the right line number.
// If not found, return -1.
//
// Hint:
// There's a particular stabs type used for line numbers.
// Look at the STABS documentation and <inc/stab.h> to find
// which one.
// Your code here.
stab_binsearch(stabs, &lline, &rline, N_SLINE, addr);
if (lline <= rline) {
info->eip_line = stabs[lline].n_desc;
} else {
info->eip_line = -1;
}
注意,N_SLINE
和选用n_desc
都是参考的inc/stab.h
。
运行结果如下:
$ make qemu-nox
...
Stack backtrace:
ebp f010ff18 eip f0100078 args 00000000 00000000 00000000 f010004a f0111308
kern/init.c:18: test_backtrace+56
ebp f010ff38 eip f01000a1 args 00000000 00000001 f010ff78 f010004a f0111308
kern/init.c:16: test_backtrace+97
ebp f010ff58 eip f01000a1 args 00000001 00000002 f010ff98 f010004a f0111308
kern/init.c:16: test_backtrace+97
ebp f010ff78 eip f01000a1 args 00000002 00000003 f010ffb8 f010004a f0111308
kern/init.c:16: test_backtrace+97
ebp f010ff98 eip f01000a1 args 00000003 00000004 00000000 f010004a f0111308
kern/init.c:16: test_backtrace+97
ebp f010ffb8 eip f01000a1 args 00000004 00000005 00000000 f010004a f0111308
kern/init.c:16: test_backtrace+97
ebp f010ffd8 eip f01000f4 args 00000005 00001aac 00000640 00000000 00000000
kern/init.c:39: i386_init+78
ebp f010fff8 eip f010003e args 00000003 00001003 00002003 00003003 00004003
kern/entry.S:83: <unknown>+0
...
把指令加入monitor.c
使之能够作为命令运行:
// monitor.c
static struct Command commands[] = {
{ "help", "Display this list of commands", mon_help },
{ "kerninfo", "Display information about the kernel", mon_kerninfo },
{ "backtrace", "Display function stack", mon_backtrace },
};
之后的运行结果就是:
K> backtrace
Stack backtrace:
ebp f010ff58 eip f0100b17 args 00000001 f010ff80 00000000 f0100b7b f0100b2a
kern/monitor.c:148: monitor+332
ebp f010ffd8 eip f0100101 args 00000000 00001aac 00000640 00000000 00000000
kern/init.c:43: i386_init+91
ebp f010fff8 eip f010003e args 00000003 00001003 00002003 00003003 00004003
kern/entry.S:83: <unknown>+0
这样就全部完成了lab1了,判一下成绩:
running JOS: (1.3s)
printf: OK
backtrace count: OK
backtrace arguments: OK
backtrace symbols: OK
backtrace lines: OK
Score: 50/50