date: 2019-03-05
tags: OS 6.828
注意,在运行lab3之前,需要修改kern/kernel.ld
文件中的bss
部分为:
.bss : {
PROVIDE(edata = .);
*(.dynbss)
*(.bss .bss.*)
*(COMMON)
PROVIDE(end = .);
}
非常感谢解决了这个问题的同学,解决的原文在这里。
首先我们需要看一下新的inc/env.h
文件,其中包含了user environment的基本定义:
typedef int32_t envid_t;
// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0| Uniqueifier | Environment |
// | | | Index |
// +------------------------------------+------------------+
// \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array. The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors. The envid_t == 0 is special, and
// stands for the current environment.
#define LOG2NENV 10
#define NENV (1 << LOG2NENV)
#define ENVX(envid) ((envid) & (NENV - 1))
// Values of env_status in struct Env
enum {
ENV_FREE = 0,
ENV_DYING,
ENV_RUNNABLE,
ENV_RUNNING,
ENV_NOT_RUNNABLE
};
// Special environment types
enum EnvType {
ENV_TYPE_USER = 0,
};
struct Env {
struct Trapframe env_tf; // Saved registers
struct Env *env_link; // Next free Env
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
enum EnvType env_type; // Indicates special system environments
unsigned env_status; // Status of the environment
uint32_t env_runs; // Number of times environment has run
// Address space
pde_t *env_pgdir; // Kernel virtual address of page dir
};
虽然这个lab只会去创建一个user environment,但是为了之后的lab,需要能够支持多个environment。
kern/env.c
的前几行定义了kernel中的3个和环境相关的重要全局变量:
struct Env *envs = NULL; // All environments
struct Env *curenv = NULL; // The current env
static struct Env *env_free_list; // Free environment list
// (linked by Env->env_link)
当JOS启动的时候,envs
会指向一个struct Env
的数组表示系统中所有的环境。在JOS的设计中,最多有NENV
(1024)个环境(一般远远达不到这个值)。这个数组中会存在一个能够保存这NENV
个环境的数据结构。
就像page_free_list
一样,JOS有一个env_free_list
用来表示inactive Env,用来进行allocation, deallocation。curenv
是当前正在执行的环境的指针,初始化为NULL
。
回到inc/env.h
,我们来看一下Env
struct Env {
struct Trapframe env_tf; // Saved registers
struct Env *env_link; // Next free Env
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
enum EnvType env_type; // Indicates special system environments
unsigned env_status; // Status of the environment
uint32_t env_runs; // Number of times environment has run
// Address space
pde_t *env_pgdir; // Kernel virtual address of page dir
};
对于这些field的更详细的解释是:
env_tf
: 当该环境不运行的时候保存寄存器,比如从user mode到kernel mode的转换过程。env_link
: 指向env_free_list
里的下一个Env
。env_id
: 保存一个uniquely identifier。注意如果一个环境被释放了,之后又有环境用了这个struct Env
,他们的env_id
会不同。env_parent_id
: 保存创建了这个环境的环境的env_id
。从而可以建立一个树,从而方便一些security decision,也就是决定某个环境是否有某个权限。env_type
: 用来区分特殊环境的,普通的都是ENV_TYPE_USER
。env_status
: 状态,具体取值如下:
// Values of env_status in struct Env
enum {
ENV_FREE = 0, // inactive, Env在env_free上
ENV_DYING, // zombie, environment, 下次trap到kernel的时候会被释放
ENV_RUNNABLE, // waiting to run
ENV_RUNNING, // currently running
ENV_NOT_RUNNABLE // currently active, but not ready to run,如等待IPC
};
env_pgdir
: 该环境的page directory同Unix process一样,JOS环境结合了thread与address space。thread用保存的寄存器确定(env_tf
),address space用env_pgdir
确定。kernel必须要设置好这两者以成功运行某个环境。
修改mem_init
以分配envs
的地址。并把envs映射到kernel page directory的对应位置。
//////////////////////////////////////////////////////////////////////
// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
// LAB 3: Your code here.
envs = (struct Env *)boot_alloc(NENV*sizeof(struct Env));
...
//////////////////////////////////////////////////////////////////////
// Map the 'envs' array read-only by the user at linear address UENVS
// (ie. perm = PTE_U | PTE_P).
// Permissions:
// - the new image at UENVS -- kernel R, user R
// - envs itself -- kernel RW, user NONE
// LAB 3: Your code here.
boot_map_region(kern_pgdir, UENVS, PTSIZE, PADDR(envs), PTE_U | PTE_P);
注意后面的这个映射位置以及大小是参照的JOS的虚拟内存分布。写完之后运行kernel应该会出现好几个succeeded:
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_free_list() succeeded!
check_page_installed_pgdir() succeeded!
因为现在还没有file system,所以要运行一个用户环境需要让kernel去加载一个静态的二进制image。这些影响都在obj/user/
中,这些在kern/Makefrag
中也有体现:
# Binary program images to embed within the kernel.
# Binary files for LAB3
KERN_BINFILES := user/hello \
user/buggyhello \
user/buggyhello2 \
user/evilhello \
user/testbss \
user/divzero \
user/breakpoint \
user/softint \
user/badsegment \
user/faultread \
user/faultreadkernel \
user/faultwrite \
user/faultwritekernel
...
# How to build the kernel itself
$(OBJDIR)/kern/kernel: $(KERN_OBJFILES) $(KERN_BINFILES) kern/kernel.ld \
$(OBJDIR)/.vars.KERN_LDFLAGS
@echo + ld $@
$(V)$(LD) -o $@ $(KERN_LDFLAGS) $(KERN_OBJFILES) $(GCC_LIB) -b binary $(KERN_BINFILES)
$(V)$(OBJDUMP) -S $@ > $@.asm
$(V)$(NM) -n $@ > $@.sym
这里的-b binary
表示把文件当成raw unterpreted binary而不是编译器生成的.o
文件。如果查看obj/kern/kernel.sym
,可以看到一系列神奇的symbol
00008acc A _binary_obj_user_hello_size
00008ad0 A _binary_obj_user_badsegment_size
00008ad0 A _binary_obj_user_breakpoint_size
00008ad0 A _binary_obj_user_buggyhello_size
00008ad0 A _binary_obj_user_evilhello_size
00008ad0 A _binary_obj_user_faultread_size
00008ad0 A _binary_obj_user_faultwrite_size
00008ad0 A _binary_obj_user_softint_size
00008ad8 A _binary_obj_user_faultreadkernel_size
00008ad8 A _binary_obj_user_faultwritekernel_size
00008ae4 A _binary_obj_user_divzero_size
00008ae8 A _binary_obj_user_testbss_size
00008aec A _binary_obj_user_buggyhello2_size
linker生成了这些symbol以让kernel可以调用这些二进制文件。
在kern/init.c
中,i386_init()
函数会调用这些二进制文件中的一个(默认是user_hello
)。但是这个函数里面和环境相关的部分都还没有完成,下面就是要填充上这些函数了。
完成kern/env.c
中的如下函数:
env_init()
初始化envs
与env_free_list
// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
// Set up envs array
// LAB 3: Your code here.
memset(envs, 0, sizeof(envs));
env_free_list = NULL;
for(int i=NENV-1; i>=0; i--) {
envs[i].env_link = env_free_list;
env_free_list = envs + i;
}
// Per-CPU part of the initialization
env_init_percpu();
}
注意这里注释要求env_free_list
顺序和envs
的一样所以这里和page_init
的顺序相反。(不明白为啥...)
env_setup_vm()
把初始化env_pgdir
,并把其中的kernel部分的内存分配好。
// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error. Errors include:
// -E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
int i;
struct PageInfo *p = NULL;
// Allocate a page for the page directory
if (!(p = page_alloc(ALLOC_ZERO)))
return -E_NO_MEM;
// Now, set e->env_pgdir and initialize the page directory.
//
// Hint:
// - The VA space of all envs is identical above UTOP
// (except at UVPT, which we've set below).
// See inc/memlayout.h for permissions and layout.
// Can you use kern_pgdir as a template? Hint: Yes.
// (Make sure you got the permissions right in Lab 2.)
// - The initial VA below UTOP is empty.
// - You do not need to make any more calls to page_alloc.
// - Note: In general, pp_ref is not maintained for
// physical pages mapped only above UTOP, but env_pgdir
// is an exception -- you need to increment env_pgdir's
// pp_ref for env_free to work correctly.
// - The functions in kern/pmap.h are handy.
// LAB 3: Your code here.
e->env_pgdir = page2kva(p);
p->pp_ref++;
memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
// UVPT maps the env's own page table read-only.
// Permissions: kernel R, user R
e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;
return 0;
}
注意这里用内存复制是因为boot_region_map
是一个静态函数,不能调用。
region_alloc
在当前环境下在虚拟地址va
处分配长为len
的内存。
// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
// LAB 3: Your code here.
// (But only if you need it for load_icode.)
//
// Hint: It is easier to use region_alloc if the caller can pass
// 'va' and 'len' values that are not page-aligned.
// You should round va down, and round (va + len) up.
// (Watch out for corner-cases!)
int r;
void *v = ROUNDDOWN(va, PGSIZE);
void* end = ROUNDUP(va + len, PGSIZE);
struct PageInfo *p = NULL;
for(; v < end; v += PGSIZE) {
if((p = page_alloc(ALLOC_ZERO)) == NULL)
panic("region_alloc: %e", -E_NO_MEM);
if((r = page_insert(e->env_pgdir, p, v, PTE_U | PTE_W | PTE_P)) < 0)
panic("region_alloc: %e", r);
}
}
load_icode()
设置initial program binary, stack与processor flags。如注释所说,主要是模仿boot/main.c
中的函数。注意需要切换pgdir
,因为进入这个函数的时候是kernel mode,但是分配内存要在用户的地址空间分配。
// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk. Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
// - How might load_icode fail? What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
// Hints:
// Load each program segment into virtual memory
// at the address specified in the ELF segment header.
// You should only load segments with ph->p_type == ELF_PROG_LOAD.
// Each segment's virtual address can be found in ph->p_va
// and its size in memory can be found in ph->p_memsz.
// The ph->p_filesz bytes from the ELF binary, starting at
// 'binary + ph->p_offset', should be copied to virtual address
// ph->p_va. Any remaining memory bytes should be cleared to zero.
// (The ELF header should have ph->p_filesz <= ph->p_memsz.)
// Use functions from the previous lab to allocate and map pages.
//
// All page protection bits should be user read/write for now.
// ELF segments are not necessarily page-aligned, but you can
// assume for this function that no two segments will touch
// the same virtual page.
//
// You may find a function like region_alloc useful.
//
// Loading the segments is much simpler if you can move data
// directly into the virtual addresses stored in the ELF binary.
// So which page directory should be in force during
// this function?
//
// You must also do something with the program's entry point,
// to make sure that the environment starts executing there.
// What? (See env_run() and env_pop_tf() below.)
// LAB 3: Your code here.
struct Elf *elfhdr = (struct Elf *)binary;
if (elfhdr->e_magic != ELF_MAGIC)
panic("load_icode: not valid elf file");
struct Proghdr *ph, *eph;
ph = (struct Proghdr *) (binary + elfhdr->e_phoff);
eph = ph + elfhdr->e_phnum;
lcr3(PADDR(e->env_pgdir));
for (; ph < eph; ph++) {
if(ph->p_type == ELF_PROG_LOAD) {
region_alloc(e, (void *)ph->p_va, ph->p_memsz);
memcpy((void *)(ph->p_va), (void *)(binary + ph->p_offset), ph->p_filesz);
}
}
e->env_tf.tf_eip = elfhdr->e_entry;
lcr3(PADDR(kern_pgdir));
// Now map one page for the program's initial stack
// at virtual address USTACKTOP - PGSIZE.
// LAB 3: Your code here.
region_alloc(e, (void *)(USTACKTOP - PGSIZE), PGSIZE);
}
env_create
就是结合上面的两个函数,先env_alloc
再load_icode
// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
// LAB 3: Your code here.
int r;
struct Env *e = NULL;
if((r = env_alloc(&e, 0)) < 0)
panic("env_create: %e", r);
load_icode(e, binary);
e->env_type = type;
}
env_run()
按照注释的要求一步一步写就好了。注意别忘了最开始curenv
可能是NULL
。
// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
// Step 1: If this is a context switch (a new environment is running):
// 1. Set the current environment (if any) back to
// ENV_RUNNABLE if it is ENV_RUNNING (think about
// what other states it can be in),
// 2. Set 'curenv' to the new environment,
// 3. Set its status to ENV_RUNNING,
// 4. Update its 'env_runs' counter,
// 5. Use lcr3() to switch to its address space.
// Step 2: Use env_pop_tf() to restore the environment's
// registers and drop into user mode in the
// environment.
// Hint: This function loads the new environment's state from
// e->env_tf. Go back through the code you wrote above
// and make sure you have set the relevant parts of
// e->env_tf to sensible values.
// LAB 3: Your code here.
if(curenv && curenv->env_status == ENV_RUNNING)
curenv->env_status = ENV_RUNNABLE;
curenv = e;
curenv->env_status = ENV_RUNNING;
curenv->env_runs++;
lcr3(PADDR(curenv->env_pgdir));
env_pop_tf(&(curenv->env_tf));
// panic("env_run not yet implemented");
}
完成了这几步之后运行当启动kernel的时候会进行如下操作:
kern/entry.S
):kernel的entry,也就是boot loader加载kernel的entryi386_init
(kern/init.c
):上面的entry调用了这个函数,对kernel进行初始化
cons_init
:初始化consolemem_init
:初始化kernel address spaceenv_init
:初始化所有的环境trap_init
(still incomplete at this point):初始化中断env_create
:创建一个用户环境env_run
:运行用户环境env_pop_tf
:从trapframe中还原这个用户环境所需要的寄存器状态。完成了exercise 2之后因为并没有初始化中断,所以会在user_hello
第一次进行system call的时候报triple fault的错。这是因为:When the CPU discovers that it is not set up to handle this system call interrupt, it will generate a general protection exception, find that it can't handle that, generate a double fault exception, find that it can't handle that either, and finally give up with what's known as a "triple fault".
我们可以使用gdb来检测是否进入了用户环境,在env_pop_tf
中加断点之后逐步运行可以发现其会运行至地址为0x800020
(可能会有出入)的指令,也就是进入了user mode。然后在int $0x30
处加断点,之后再运行1步就会进入triple fault了。
我们来完成中断部分。
读书,在这里就不记录了。
Exeception和interrupt都是protected control transfer,其在用户代码不能干涉kernel的情况下,让处理器进入kernel mode。在intel的术语中,interrupt是由处理器外部的异步事件,如IO引起的,而exception是同步运行的代码引起的,如除0或者page fault。
之前提到过,为了能够做到protected,处理器的中断机制让用户只能进入几个固定的kernel位置。在x86
中,由2种机制可以确保这种protection。
The Interrupt Descriptor Table (IDT)
一个在kernel private memory中的表,记录了0~255这256种不同的中断的EIP
和CS
,前者是中断进入的kernel code的位置,后者是中断的privilege level(在JOS中都是0,也就是kernel mode)。
The Task State Segment (TSS)
用于存放中断前的old processor state,用于中断之后还原状态。注意这部分也是存储在kernel stack中的。
尽管TSS很大,可以有很多功用,JOS仅仅记录中断转移到的kernel stack。处理器用ESP0
和SS0
来定义kernel mode,且JOS中不使用TSS的其他field。
大于31的中断为software interrupt或hardware interrupt,前者是可以用int
指令进入,后者是外部硬件发出的。
在这节里,我们会拓展JOS使其可以处理它自己产生的0~31中断。之后一节我们会处理48(0x30
),也就是system call,注意这个48是随机选的。lab4里面我们会处理硬件中断。
例如,代码中出现了除0,那么:
ESP0
和SS0
来切换到kernel stack。在JOS中,这两个值分别是GD_KD
与KSTACKTOP
。处理器会把exception parameter推进kernel stack,其地址始于KSTACKTOP
。
+--------------------+ KSTACKTOP
| 0x00000 | old SS | " - 4
| old ESP | " - 8
| old EFLAGS | " - 12
| 0x00000 | old CS | " - 16
| old EIP | " - 20 <---- ESP
+--------------------+
x86
中对应的是vector 0,处理器会读取IDT中entry 0,并设置对应的CS:IP
。对于一些特殊的exception,除了会推入上述的5个words,处理器还会退入error code。可以阅读80386的manual来查看不同的error code意味着什么。
+--------------------+ KSTACKTOP
| 0x00000 | old SS | " - 4
| old ESP | " - 8
| old EFLAGS | " - 12
| 0x00000 | old CS | " - 16
| old EIP | " - 20
| error code | " - 24 <---- ESP
+--------------------+
中断即可以在user mode中产生,也可以从kernel mode中产生。但是x86
处理器只会在从user到kernel的过程中自动保存old register state。如果发生中断时已经在kernel里了,CPU只会继续向同样的kernel stack中推入值,从而使kernel可以处理嵌套的中断。
具体来说,因为不需要换栈,所以就不需要保存SS
与ESP
,所以handler眼中的第二个中断对应的stack就会是这样:
+--------------------+ <---- old ESP
| old EFLAGS | " - 4
| 0x00000 | old CS | " - 8
| old EIP | " - 12
+--------------------+
我们来设置0~31的IDT。我们需要用到的一些定义在inc/trap.h
与kern/trap.h
中。
注意,0~31中的有一些中断已经被intel保留了,所以处理器永远都不会产生这些中断,怎么处理都行。
整个的控制方式应该如下:
IDT trapentry.S trap.c
+----------------+
| &handler1 |---------> handler1: trap (struct Trapframe *tf)
| | // do stuff {
| | call trap // handle the exception/interrupt
| | // ... }
+----------------+
| &handler2 |--------> handler2:
| | // do stuff
| | call trap
| | // ...
+----------------+
.
.
.
+----------------+
| &handlerX |--------> handlerX:
| | // do stuff
| | call trap
| | // ...
+----------------+
每个中断都应该在trapentry.S
和trap_init()
中有其对应的地址。
这部分我主要是通过和xv6的对应部分对照着写的。
首先写trapentry.S
,这个文件分为两部分,第一是写handler:
/*
* Lab 3: Your code here for generating entry points for the different traps.
*/
TRAPHANDLER_NOEC(T_DIVIDE_handler, T_DIVIDE)
TRAPHANDLER_NOEC(T_DEBUG_handler, T_DEBUG)
TRAPHANDLER_NOEC(T_NMI_handler, T_NMI)
TRAPHANDLER_NOEC(T_BRKPT_handler, T_BRKPT)
TRAPHANDLER_NOEC(T_OFLOW_handler, T_OFLOW)
TRAPHANDLER_NOEC(T_BOUND_handler, T_BOUND)
TRAPHANDLER_NOEC(T_ILLOP_handler, T_ILLOP)
TRAPHANDLER_NOEC(T_DEVICE_handler, T_DEVICE)
TRAPHANDLER(T_DBLFLT_handler, T_DBLFLT)
TRAPHANDLER(T_TSS_handler, T_TSS)
TRAPHANDLER(T_SEGNP_handler, T_SEGNP)
TRAPHANDLER(T_STACK_handler, T_STACK)
TRAPHANDLER(T_GPFLT_handler, T_GPFLT)
TRAPHANDLER(T_PGFLT_handler, T_PGFLT)
TRAPHANDLER_NOEC(T_FPERR_handler, T_FPERR)
TRAPHANDLER(T_ALIGN_handler, T_ALIGN)
TRAPHANDLER_NOEC(T_MCHK_handler, T_MCHK)
TRAPHANDLER_NOEC(T_SIMDERR_handler, T_SIMDERR)
TRAPHANDLER_NOEC(T_SYSCALL_handler, T_SYSCALL)
具体是使用TRAPHANDLER
还是TRAPHANDLER_NOEC
可以对照xv6的vector.S
文件。
然后是写_alltrap
:
/*
* Lab 3: Your code here for _alltraps
*/
# vectors.S sends all traps here.
_alltraps:
# Build trap frame.
pushl %ds
pushl %es
pushal
# Set up data segments.
movw $GD_KD, %ax
movw %ax, %ds
movw %ax, %es
# Call trap(tf), where tf=%esp
pushl %esp
call trap
addl $4, %esp
popal
popl %es
popl %ds
addl $0x8, %esp # trapno and errcode
iret
注意要对照着inc/trap.h
中的Trapframe
的定义来写,同时要参照xv6中的trapasm.S
和x86.h
(有trapframe的定义)来写。最后是trap_init()
。因为在trapentry.S
中只有函数名是全局变量,所以只能重复性的写很多...
void
trap_init(void)
{
extern struct Segdesc gdt[];
// LAB 3: Your code here.
void T_DIVIDE_handler();
void T_DEBUG_handler();
void T_NMI_handler();
void T_BRKPT_handler();
void T_OFLOW_handler();
void T_BOUND_handler();
void T_ILLOP_handler();
void T_DEVICE_handler();
void T_DBLFLT_handler();
void T_TSS_handler();
void T_SEGNP_handler();
void T_STACK_handler();
void T_GPFLT_handler();
void T_PGFLT_handler();
void T_FPERR_handler();
void T_ALIGN_handler();
void T_MCHK_handler();
void T_SIMDERR_handler();
void T_SYSCALL_handler();
SETGATE(idt[T_DIVIDE], 1, GD_KT, T_DIVIDE_handler, 0);
SETGATE(idt[T_DEBUG], 1, GD_KT, T_DEBUG_handler, 0);
SETGATE(idt[T_NMI], 1, GD_KT, T_NMI_handler, 0);
SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 0);
SETGATE(idt[T_OFLOW], 1, GD_KT, T_OFLOW_handler, 0);
SETGATE(idt[T_BOUND], 1, GD_KT, T_BOUND_handler, 0);
SETGATE(idt[T_ILLOP], 1, GD_KT, T_ILLOP_handler, 0);
SETGATE(idt[T_DEVICE], 1, GD_KT, T_DEVICE_handler, 0);
SETGATE(idt[T_DBLFLT], 1, GD_KT, T_DBLFLT_handler, 0);
SETGATE(idt[T_TSS], 1, GD_KT, T_TSS_handler, 0);
SETGATE(idt[T_SEGNP], 1, GD_KT, T_SEGNP_handler, 0);
SETGATE(idt[T_STACK], 1, GD_KT, T_STACK_handler, 0);
SETGATE(idt[T_GPFLT], 1, GD_KT, T_GPFLT_handler, 0);
SETGATE(idt[T_PGFLT], 1, GD_KT, T_PGFLT_handler, 0);
SETGATE(idt[T_FPERR], 1, GD_KT, T_FPERR_handler, 0);
SETGATE(idt[T_ALIGN], 1, GD_KT, T_ALIGN_handler, 0);
SETGATE(idt[T_MCHK], 1, GD_KT, T_MCHK_handler, 0);
SETGATE(idt[T_SIMDERR], 1, GD_KT, T_SIMDERR_handler, 0);
SETGATE(idt[T_SYSCALL], 0, GD_KT, T_SYSCALL_handler, 3);
// Per-CPU setup
trap_init_percpu();
}
然后运行make grade
,就通过了Part A。注意这里的代码虽然可以通过lab3,但是到lab4会出问题...因为其istrap
参数的问题,详情请见lab4。
回答两个问题:
SETGATE
中的trapit
了,也就是不能区分exception和interruption了,同时也不能给不同的中断设置不同的中断等级了。user/softint
中的int $14
会进入vector 13?14的privilege level是0,也就是user不能调用,在上面的代码中不是$30
都会被识别为general protection fault,也就是中断13。处理其他的中断。
处理page fault,也就是14。当发生page fault的时候,处理器会把产生错误的地址存在CR2
寄存器中。
在trap_dispatch()
里面加入page_fault_handler()
static void
trap_dispatch(struct Trapframe *tf)
{
// Handle processor exceptions.
// LAB 3: Your code here.
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
default:
break;
}
// Unexpected trap: The user process or the kernel has a bug.
print_trapframe(tf);
if (tf->tf_cs == GD_KT)
panic("unhandled trap in kernel");
else {
env_destroy(curenv);
return;
}
}
对于断点中断,需要调用的是kern/monitor.c
中的monitor
函数,不过注意,因为breakpoint.c
中是通过直接触法来进行测试的,所以需要把断点的等级调为3
SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 3);
然后trap_dispatch
为:
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
case T_BRKPT:
monitor(tf);
return;
default:
break;
}
在JOS中,我们使用int $0x30
来进行system call。应用会自己把system call需要的参数以及其编号川籍来,所以kernel就不需要去操作用户环境或者instruction stream了。system call number会在%eax
, 参数(前5个)会在 %edx
, %ecx
, %ebx
, %edi
, 和 %esi
。同样,kernel会把返回值存在%eax
中。syscall
函数在lb/syscall.c
中。
static inline int32_t
syscall(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
int32_t ret;
// Generic system call: pass system call number in AX,
// up to five parameters in DX, CX, BX, DI, SI.
// Interrupt kernel with T_SYSCALL.
//
// The "volatile" tells the assembler not to optimize
// this instruction away just because we don't use the
// return value.
//
// The last clause tells the assembler that this can
// potentially change the condition codes and arbitrary
// memory locations.
asm volatile("int %1\n"
: "=a" (ret)
: "i" (T_SYSCALL),
"a" (num),
"d" (a1),
"c" (a2),
"b" (a3),
"D" (a4),
"S" (a5)
: "cc", "memory");
if(check && ret > 0)
panic("syscall %d returned %d (> 0)", num, ret);
return ret;
}
上面的这种写法叫gcc内联汇编,感兴趣的同学可以取查一下。
注意这里的和xv6的对比,明显JOS比xv6要简单很多,并没有通过用户的stack(esp
)来掏出来参数,而且JOS也没有myproc
这样一个全局状态。
加入system call的handler。由于我们已经加过了基本设置,所以只需要修改trap_dispatch()
与kern/syscall.c
中的syscall()
了。
首先是trap_dispatch()
:
switch(tf->tf_trapno) {
case T_PGFLT:
page_fault_handler(tf);
return;
case T_BRKPT:
monitor(tf);
return;
case T_SYSCALL:
tf->tf_regs.reg_eax = syscall(
tf->tf_regs.reg_eax, tf->tf_regs.reg_edx,
tf->tf_regs.reg_ecx, tf->tf_regs.reg_ebx,
tf->tf_regs.reg_edi, tf->tf_regs.reg_esi
);
return;
default:
break;
}
注意别忘了用返回值更新eax
。
其次是syscall()
:
// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
// Call the function corresponding to the 'syscallno' parameter.
// Return any appropriate return value.
// LAB 3: Your code here.
// panic("syscall not implemented");
switch (syscallno) {
case SYS_cputs:
sys_cputs((char *)a1, (size_t)a2);
return;
case SYS_cgetc:
return sys_cgetc();
case SYS_getenvid:
return sys_getenvid();
case SYS_env_destroy:
return sys_env_destroy((envid_t)a1);
default:
return -E_INVAL;
}
}
用户应用会从lib/entry
进入,然后调用lib/libmain.c
中的libmain()
,之后libmain
会调用umain
也就是进入了比如hello
这样的函数中。我们希望能够在用户应用中使用thisenv
也就是当前的环境状态。由于我们已经有了sys_getenvid()
这样的函数,这个函数在lib/syscall.c
中被声明,用来掉system call中的SYS_getenvid
。有了envid
之后,因为从inc/env.h
中得知:
// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0| Uniqueifier | Environment |
// | | | Index |
// +------------------------------------+------------------+
// \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array. The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors. The envid_t == 0 is special, and
// stands for the current environment.
#define LOG2NENV 10
#define NENV (1 << LOG2NENV)
#define ENVX(envid) ((envid) & (NENV - 1))
我们只需要取后10位就可以得到当前环境在envs
中的序号了,所以有:
thisenv = &envs[ENVX(sys_getenvid())];
内存保护是操作系统非常重要的一部分,也是保证bug不能破坏其他程序或者kernel的一个重要手段。
操作系统通常通过硬件来实现内存保护。OS让硬件知道哪些虚拟地址是可以访问的,哪些不行。当一个程序试图访问非法地址的之后,处理器会trap。如果问题可以结局,那么kernel就会解决这个问题并让程序继续运行,如果不行,那么程序就不会继续运行。
一个常见的解决方法是自动扩充stack。一般默认就分配一个page作为用户的stack,如果触发了page fault,就自动再进行分配。
system call会导致一个很有趣的问题。很多system call允许用户传指针进kernel,这些指针会指向读写的buffer。这种做法有两个问题:
基于这两个原因,我们需要很谨慎的处理传进kernel的指针。
我们讲用一个机制来解决这两个问题。当程序向kernel传递指针的时候,kernel会检查该指针是不是在用户地址内,以及对应的page table允许内存操作。这样,kernel就不会因为dereference用户指针导致page fault了。
首先给trap
中加上page fault在kernel mode,直接panic
:
switch(tf->tf_trapno) {
case T_PGFLT:
if ((tf->tf_cs & 0x3) == 0)
panic("page fault in kernel");
page_fault_handler(tf);
return;
之后补全kern/pmap.c
中的user_mem_check
:
// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range. You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission. These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
// LAB 3: Your code here.
uintptr_t v = ROUNDDOWN((uintptr_t)va, PGSIZE);
uintptr_t end = ROUNDUP((uintptr_t)va + len, PGSIZE);
for(;v < end; v += PGSIZE) {
pte_t *pte = pgdir_walk(env->env_pgdir, (void *)v, 0);
if (!pte || (*pte & perm) != perm) {
if(v < (uintptr_t)va)
user_mem_check_addr = (uintptr_t)va;
else
user_mem_check_addr = v;
return -E_FAULT;
}
}
return 0;
}
注意需要返回的是这区间里的第一个地址,所以如果v
比va
小,返回的应该是va
。
然后修改syscall.c
中的sys_cputs
以检查指针。
static void
sys_cputs(const char *s, size_t len)
{
// Check that the user has permission to read memory [s, s+len).
// Destroy the environment if not.
// LAB 3: Your code here.
user_mem_assert(curenv, s, len, PTE_U);
// Print the string supplied by the user.
cprintf("%.*s", len, s);
}
之后,为了在breakpoint
中实现backtrace
功能,在kern/kdebug.c
的debuginfo_eip()
中加入如下代码:
// Find the relevant set of stabs
if (addr >= ULIM) {
stabs = __STAB_BEGIN__;
stab_end = __STAB_END__;
stabstr = __STABSTR_BEGIN__;
stabstr_end = __STABSTR_END__;
} else {
// The user-application linker script, user/user.ld,
// puts information about the application's stabs (equivalent
// to __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and
// __STABSTR_END__) in a structure located at virtual address
// USTABDATA.
const struct UserStabData *usd = (const struct UserStabData *) USTABDATA;
// Make sure this memory is valid.
// Return -1 if it is not. Hint: Call user_mem_check.
// LAB 3: Your code here.
if(user_mem_check(curenv, (void *)usd, sizeof(struct UserStabData), PTE_U))
return -1;
stabs = usd->stabs;
stab_end = usd->stab_end;
stabstr = usd->stabstr;
stabstr_end = usd->stabstr_end;
// Make sure the STABS and string table memory is valid.
// LAB 3: Your code here.
if(user_mem_check(curenv, (void *)stabs, stab_end - stabs, PTE_U))
return -1;
if(user_mem_check(curenv, (void *)stabstr, stabstr_end - stabstr, PTE_U))
return -1;
}
之后运行make run-breakpoint-nox
进入中断之后,如果运行bracktrack
就会有如下结果:
K> backtrace
Stack backtrace:
ebp efffff00 eip f0100ad7 args 00000001 efffff28 f01d2000 f0106781 f011af48
kern/monitor.c:151: monitor+353
ebp efffff80 eip f010429b args f01d2000 efffffbc f0150508 00000092 f011afd8
kern/trap.c:191: trap+282
ebp efffffb0 eip f0104389 args efffffbc 00000000 00000000 eebfdfc0 efffffdc
kern/trapentry.S:87: <unknown>+0
ebp eebfdfc0 eip 00800087 args 00000000 00000000 eebfdff0 00800058 00000000
lib/libmain.c:25: libmain+78
ebp eebfdff0 eip 00800031 args 00000000 00000000Incoming TRAP frame at 0xeffffe64
kernel panic at kern/trap.c:187: page fault in kernel
这里为什么没有搞懂。。。
当完成exercise 9的时候,exercise 10自动完成了。9和10的唯一区别就是传入的指针,一个是未分配的,另外一个是传入了对应kernel部分的地址,这两者都可以用上面的检查方法搞定。
最后来运行一下make grade
:
divzero: OK (1.0s)
softint: OK (0.9s)
badsegment: OK (1.0s)
Part A score: 30/30
faultread: OK (1.0s)
faultreadkernel: OK (2.0s)
faultwrite: OK (1.1s)
faultwritekernel: OK (1.8s)
breakpoint: OK (1.1s)
testbss: OK (1.9s)
hello: OK (2.1s)
buggyhello: OK (2.0s)
buggyhello2: OK (2.2s)
evilhello: OK (1.8s)
Part B score: 50/50
Score: 80/80