date: 2019-03-05 
tags: OS  6.828  
注意,在运行lab3之前,需要修改kern/kernel.ld文件中的bss部分为:
	.bss : {
		PROVIDE(edata = .);
		*(.dynbss)
		*(.bss .bss.*)
		*(COMMON)
		PROVIDE(end = .);
	}非常感谢解决了这个问题的同学,解决的原文在这里。
首先我们需要看一下新的inc/env.h文件,其中包含了user environment的基本定义:
typedef int32_t envid_t;
// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0|          Uniqueifier             |   Environment    |
// | |                                  |      Index       |
// +------------------------------------+------------------+
//                                       \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array.  The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors.  The envid_t == 0 is special, and
// stands for the current environment.
#define LOG2NENV		10
#define NENV			(1 << LOG2NENV)
#define ENVX(envid)		((envid) & (NENV - 1))
// Values of env_status in struct Env
enum {
	ENV_FREE = 0,
	ENV_DYING,
	ENV_RUNNABLE,
	ENV_RUNNING,
	ENV_NOT_RUNNABLE
};
// Special environment types
enum EnvType {
	ENV_TYPE_USER = 0,
};
struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run
	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};虽然这个lab只会去创建一个user environment,但是为了之后的lab,需要能够支持多个environment。
kern/env.c的前几行定义了kernel中的3个和环境相关的重要全局变量:
struct Env *envs = NULL;		// All environments
struct Env *curenv = NULL;		// The current env
static struct Env *env_free_list;	// Free environment list
					// (linked by Env->env_link)当JOS启动的时候,envs会指向一个struct Env的数组表示系统中所有的环境。在JOS的设计中,最多有NENV(1024)个环境(一般远远达不到这个值)。这个数组中会存在一个能够保存这NENV个环境的数据结构。
就像page_free_list一样,JOS有一个env_free_list用来表示inactive Env,用来进行allocation, deallocation。curenv是当前正在执行的环境的指针,初始化为NULL。
回到inc/env.h,我们来看一下Env
struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run
	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};对于这些field的更详细的解释是:
env_tf: 当该环境不运行的时候保存寄存器,比如从user mode到kernel mode的转换过程。
env_link: 指向env_free_list里的下一个Env。
env_id: 保存一个uniquely identifier。注意如果一个环境被释放了,之后又有环境用了这个struct Env,他们的env_id会不同。
env_parent_id:  保存创建了这个环境的环境的env_id。从而可以建立一个树,从而方便一些security decision,也就是决定某个环境是否有某个权限。
env_type: 用来区分特殊环境的,普通的都是ENV_TYPE_USER。
env_status: 状态,具体取值如下:
// Values of env_status in struct Env
enum {
	ENV_FREE = 0,  // inactive, Env在env_free上
	ENV_DYING,  // zombie, environment, 下次trap到kernel的时候会被释放
	ENV_RUNNABLE,  // waiting to run
	ENV_RUNNING,  // currently running
	ENV_NOT_RUNNABLE  // currently active, but not ready to run,如等待IPC
};env_pgdir: 该环境的page directory
同Unix process一样,JOS环境结合了thread与address space。thread用保存的寄存器确定(env_tf),address space用env_pgdir确定。kernel必须要设置好这两者以成功运行某个环境。
修改mem_init以分配envs的地址。并把envs映射到kernel page directory的对应位置。
	//////////////////////////////////////////////////////////////////////
	// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
	// LAB 3: Your code here.
	envs = (struct Env *)boot_alloc(NENV*sizeof(struct Env));
...
    //////////////////////////////////////////////////////////////////////
	// Map the 'envs' array read-only by the user at linear address UENVS
	// (ie. perm = PTE_U | PTE_P).
	// Permissions:
	//    - the new image at UENVS  -- kernel R, user R
	//    - envs itself -- kernel RW, user NONE
	// LAB 3: Your code here.
	boot_map_region(kern_pgdir, UENVS, PTSIZE, PADDR(envs), PTE_U | PTE_P);注意后面的这个映射位置以及大小是参照的JOS的虚拟内存分布。写完之后运行kernel应该会出现好几个succeeded:
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_free_list() succeeded!
check_page_installed_pgdir() succeeded!因为现在还没有file system,所以要运行一个用户环境需要让kernel去加载一个静态的二进制image。这些影响都在obj/user/中,这些在kern/Makefrag中也有体现:
# Binary program images to embed within the kernel.
# Binary files for LAB3
KERN_BINFILES :=	user/hello \
			user/buggyhello \
			user/buggyhello2 \
			user/evilhello \
			user/testbss \
			user/divzero \
			user/breakpoint \
			user/softint \
			user/badsegment \
			user/faultread \
			user/faultreadkernel \
			user/faultwrite \
			user/faultwritekernel
...
# How to build the kernel itself
$(OBJDIR)/kern/kernel: $(KERN_OBJFILES) $(KERN_BINFILES) kern/kernel.ld \
	  $(OBJDIR)/.vars.KERN_LDFLAGS
	@echo + ld $@
	$(V)$(LD) -o $@ $(KERN_LDFLAGS) $(KERN_OBJFILES) $(GCC_LIB) -b binary $(KERN_BINFILES)
	$(V)$(OBJDUMP) -S $@ > $@.asm
	$(V)$(NM) -n $@ > $@.sym这里的-b binary表示把文件当成raw unterpreted binary而不是编译器生成的.o文件。如果查看obj/kern/kernel.sym,可以看到一系列神奇的symbol
00008acc A _binary_obj_user_hello_size
00008ad0 A _binary_obj_user_badsegment_size
00008ad0 A _binary_obj_user_breakpoint_size
00008ad0 A _binary_obj_user_buggyhello_size
00008ad0 A _binary_obj_user_evilhello_size
00008ad0 A _binary_obj_user_faultread_size
00008ad0 A _binary_obj_user_faultwrite_size
00008ad0 A _binary_obj_user_softint_size
00008ad8 A _binary_obj_user_faultreadkernel_size
00008ad8 A _binary_obj_user_faultwritekernel_size
00008ae4 A _binary_obj_user_divzero_size
00008ae8 A _binary_obj_user_testbss_size
00008aec A _binary_obj_user_buggyhello2_sizelinker生成了这些symbol以让kernel可以调用这些二进制文件。
在kern/init.c中,i386_init()函数会调用这些二进制文件中的一个(默认是user_hello)。但是这个函数里面和环境相关的部分都还没有完成,下面就是要填充上这些函数了。
完成kern/env.c中的如下函数:
env_init()
初始化envs与env_free_list
// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
	// Set up envs array
	// LAB 3: Your code here.
	memset(envs, 0, sizeof(envs));
	env_free_list = NULL;
	for(int i=NENV-1; i>=0; i--) {
		envs[i].env_link = env_free_list;
		env_free_list = envs + i;
	}
	// Per-CPU part of the initialization
	env_init_percpu();
}注意这里注释要求env_free_list顺序和envs的一样所以这里和page_init的顺序相反。(不明白为啥...)
env_setup_vm()
把初始化env_pgdir,并把其中的kernel部分的内存分配好。
// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error.  Errors include:
//	-E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
	int i;
	struct PageInfo *p = NULL;
	// Allocate a page for the page directory
	if (!(p = page_alloc(ALLOC_ZERO)))
		return -E_NO_MEM;
	// Now, set e->env_pgdir and initialize the page directory.
	//
	// Hint:
	//    - The VA space of all envs is identical above UTOP
	//	(except at UVPT, which we've set below).
	//	See inc/memlayout.h for permissions and layout.
	//	Can you use kern_pgdir as a template?  Hint: Yes.
	//	(Make sure you got the permissions right in Lab 2.)
	//    - The initial VA below UTOP is empty.
	//    - You do not need to make any more calls to page_alloc.
	//    - Note: In general, pp_ref is not maintained for
	//	physical pages mapped only above UTOP, but env_pgdir
	//	is an exception -- you need to increment env_pgdir's
	//	pp_ref for env_free to work correctly.
	//    - The functions in kern/pmap.h are handy.
	// LAB 3: Your code here.
	e->env_pgdir = page2kva(p);
	p->pp_ref++;
	memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
	// UVPT maps the env's own page table read-only.
	// Permissions: kernel R, user R
	e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;
	return 0;
}注意这里用内存复制是因为boot_region_map是一个静态函数,不能调用。
region_alloc
在当前环境下在虚拟地址va处分配长为len的内存。
// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	// (But only if you need it for load_icode.)
	//
	// Hint: It is easier to use region_alloc if the caller can pass
	//   'va' and 'len' values that are not page-aligned.
	//   You should round va down, and round (va + len) up.
	//   (Watch out for corner-cases!)
	int r;
	void *v = ROUNDDOWN(va, PGSIZE);
	void* end = ROUNDUP(va + len, PGSIZE);
	struct PageInfo *p = NULL;
	for(; v < end; v += PGSIZE) {
		if((p = page_alloc(ALLOC_ZERO)) == NULL)
			panic("region_alloc: %e", -E_NO_MEM);
		if((r = page_insert(e->env_pgdir, p, v, PTE_U | PTE_W | PTE_P)) < 0)
			panic("region_alloc: %e", r);
	}
}load_icode()
设置initial program binary, stack与processor flags。如注释所说,主要是模仿boot/main.c中的函数。注意需要切换pgdir,因为进入这个函数的时候是kernel mode,但是分配内存要在用户的地址空间分配。
// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk.  Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
//  - How might load_icode fail?  What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
	// Hints:
	//  Load each program segment into virtual memory
	//  at the address specified in the ELF segment header.
	//  You should only load segments with ph->p_type == ELF_PROG_LOAD.
	//  Each segment's virtual address can be found in ph->p_va
	//  and its size in memory can be found in ph->p_memsz.
	//  The ph->p_filesz bytes from the ELF binary, starting at
	//  'binary + ph->p_offset', should be copied to virtual address
	//  ph->p_va.  Any remaining memory bytes should be cleared to zero.
	//  (The ELF header should have ph->p_filesz <= ph->p_memsz.)
	//  Use functions from the previous lab to allocate and map pages.
	//
	//  All page protection bits should be user read/write for now.
	//  ELF segments are not necessarily page-aligned, but you can
	//  assume for this function that no two segments will touch
	//  the same virtual page.
	//
	//  You may find a function like region_alloc useful.
	//
	//  Loading the segments is much simpler if you can move data
	//  directly into the virtual addresses stored in the ELF binary.
	//  So which page directory should be in force during
	//  this function?
	//
	//  You must also do something with the program's entry point,
	//  to make sure that the environment starts executing there.
	//  What?  (See env_run() and env_pop_tf() below.)
	// LAB 3: Your code here.
	struct Elf *elfhdr = (struct Elf *)binary;
	if (elfhdr->e_magic != ELF_MAGIC)
		panic("load_icode: not valid elf file");
	struct Proghdr *ph, *eph;
	ph = (struct Proghdr *) (binary + elfhdr->e_phoff);
	eph = ph + elfhdr->e_phnum;
	lcr3(PADDR(e->env_pgdir));
	for (; ph < eph; ph++) {
		if(ph->p_type == ELF_PROG_LOAD) {
			region_alloc(e, (void *)ph->p_va, ph->p_memsz);
			memcpy((void *)(ph->p_va), (void *)(binary + ph->p_offset), ph->p_filesz);
		}
	}
	e->env_tf.tf_eip = elfhdr->e_entry;
	lcr3(PADDR(kern_pgdir));
	// Now map one page for the program's initial stack
	// at virtual address USTACKTOP - PGSIZE.
	// LAB 3: Your code here.
	region_alloc(e, (void *)(USTACKTOP - PGSIZE), PGSIZE);
}env_create
就是结合上面的两个函数,先env_alloc再load_icode
// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	int r;
	struct Env *e = NULL;
	if((r = env_alloc(&e, 0)) < 0)
		panic("env_create: %e", r);
	load_icode(e, binary);
	e->env_type = type;
}env_run()
按照注释的要求一步一步写就好了。注意别忘了最开始curenv可能是NULL。
// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
	// Step 1: If this is a context switch (a new environment is running):
	//	   1. Set the current environment (if any) back to
	//	      ENV_RUNNABLE if it is ENV_RUNNING (think about
	//	      what other states it can be in),
	//	   2. Set 'curenv' to the new environment,
	//	   3. Set its status to ENV_RUNNING,
	//	   4. Update its 'env_runs' counter,
	//	   5. Use lcr3() to switch to its address space.
	// Step 2: Use env_pop_tf() to restore the environment's
	//	   registers and drop into user mode in the
	//	   environment.
	// Hint: This function loads the new environment's state from
	//	e->env_tf.  Go back through the code you wrote above
	//	and make sure you have set the relevant parts of
	//	e->env_tf to sensible values.
	// LAB 3: Your code here.
	if(curenv && curenv->env_status == ENV_RUNNING)
		curenv->env_status = ENV_RUNNABLE;
	curenv = e;
	curenv->env_status = ENV_RUNNING;
	curenv->env_runs++;
	lcr3(PADDR(curenv->env_pgdir));
	env_pop_tf(&(curenv->env_tf));
	// panic("env_run not yet implemented");
}完成了这几步之后运行当启动kernel的时候会进行如下操作:
kern/entry.S):kernel的entry,也就是boot loader加载kernel的entryi386_init(kern/init.c):上面的entry调用了这个函数,对kernel进行初始化
cons_init:初始化consolemem_init:初始化kernel address spaceenv_init:初始化所有的环境trap_init (still incomplete at this point):初始化中断env_create:创建一个用户环境env_run:运行用户环境
env_pop_tf:从trapframe中还原这个用户环境所需要的寄存器状态。完成了exercise 2之后因为并没有初始化中断,所以会在user_hello第一次进行system call的时候报triple fault的错。这是因为:When the CPU discovers that it is not set up to handle this system call interrupt, it will generate a general protection exception, find that it can't handle that, generate a double fault exception, find that it can't handle that either, and finally give up with what's known as a "triple fault".
我们可以使用gdb来检测是否进入了用户环境,在env_pop_tf中加断点之后逐步运行可以发现其会运行至地址为0x800020(可能会有出入)的指令,也就是进入了user mode。然后在int $0x30处加断点,之后再运行1步就会进入triple fault了。
我们来完成中断部分。
读书,在这里就不记录了。
Exeception和interrupt都是protected control transfer,其在用户代码不能干涉kernel的情况下,让处理器进入kernel mode。在intel的术语中,interrupt是由处理器外部的异步事件,如IO引起的,而exception是同步运行的代码引起的,如除0或者page fault。
之前提到过,为了能够做到protected,处理器的中断机制让用户只能进入几个固定的kernel位置。在x86中,由2种机制可以确保这种protection。
The Interrupt Descriptor Table (IDT)
一个在kernel private memory中的表,记录了0~255这256种不同的中断的EIP和CS,前者是中断进入的kernel code的位置,后者是中断的privilege level(在JOS中都是0,也就是kernel mode)。
The Task State Segment (TSS)
用于存放中断前的old processor state,用于中断之后还原状态。注意这部分也是存储在kernel stack中的。
尽管TSS很大,可以有很多功用,JOS仅仅记录中断转移到的kernel stack。处理器用ESP0和SS0来定义kernel mode,且JOS中不使用TSS的其他field。
大于31的中断为software interrupt或hardware interrupt,前者是可以用int指令进入,后者是外部硬件发出的。
在这节里,我们会拓展JOS使其可以处理它自己产生的0~31中断。之后一节我们会处理48(0x30),也就是system call,注意这个48是随机选的。lab4里面我们会处理硬件中断。
例如,代码中出现了除0,那么:
processor会通过TSS中的ESP0和SS0来切换到kernel stack。在JOS中,这两个值分别是GD_KD与KSTACKTOP。
处理器会把exception parameter推进kernel stack,其地址始于KSTACKTOP。
+--------------------+ KSTACKTOP             
| 0x00000 | old SS   |     " - 4
|      old ESP       |     " - 8
|     old EFLAGS     |     " - 12
| 0x00000 | old CS   |     " - 16
|      old EIP       |     " - 20 <---- ESP 
+--------------------+             对于除0这种情况,在x86中对应的是vector 0,处理器会读取IDT中entry 0,并设置对应的CS:IP。
最后会运行这个exception对应的handler,例如结束程序。
对于一些特殊的exception,除了会推入上述的5个words,处理器还会退入error code。可以阅读80386的manual来查看不同的error code意味着什么。
+--------------------+ KSTACKTOP             
| 0x00000 | old SS   |     " - 4
|      old ESP       |     " - 8
|     old EFLAGS     |     " - 12
| 0x00000 | old CS   |     " - 16
|      old EIP       |     " - 20
|     error code     |     " - 24 <---- ESP
+--------------------+             中断即可以在user mode中产生,也可以从kernel mode中产生。但是x86处理器只会在从user到kernel的过程中自动保存old register state。如果发生中断时已经在kernel里了,CPU只会继续向同样的kernel stack中推入值,从而使kernel可以处理嵌套的中断。
具体来说,因为不需要换栈,所以就不需要保存SS与ESP,所以handler眼中的第二个中断对应的stack就会是这样:
+--------------------+ <---- old ESP
|     old EFLAGS     |     " - 4
| 0x00000 | old CS   |     " - 8
|      old EIP       |     " - 12
+--------------------+ 我们来设置0~31的IDT。我们需要用到的一些定义在inc/trap.h与kern/trap.h中。
注意,0~31中的有一些中断已经被intel保留了,所以处理器永远都不会产生这些中断,怎么处理都行。
整个的控制方式应该如下:
      IDT                   trapentry.S         trap.c
   
+----------------+                        
|   &handler1    |---------> handler1:          trap (struct Trapframe *tf)
|                |             // do stuff      {
|                |             call trap          // handle the exception/interrupt
|                |             // ...           }
+----------------+
|   &handler2    |--------> handler2:
|                |            // do stuff
|                |            call trap
|                |            // ...
+----------------+
       .
       .
       .
+----------------+
|   &handlerX    |--------> handlerX:
|                |             // do stuff
|                |             call trap
|                |             // ...
+----------------+每个中断都应该在trapentry.S和trap_init()中有其对应的地址。
这部分我主要是通过和xv6的对应部分对照着写的。
首先写trapentry.S,这个文件分为两部分,第一是写handler:
/*
 * Lab 3: Your code here for generating entry points for the different traps.
 */
TRAPHANDLER_NOEC(T_DIVIDE_handler, T_DIVIDE)
TRAPHANDLER_NOEC(T_DEBUG_handler, T_DEBUG)
TRAPHANDLER_NOEC(T_NMI_handler, T_NMI)
TRAPHANDLER_NOEC(T_BRKPT_handler, T_BRKPT)
TRAPHANDLER_NOEC(T_OFLOW_handler, T_OFLOW)
TRAPHANDLER_NOEC(T_BOUND_handler, T_BOUND)
TRAPHANDLER_NOEC(T_ILLOP_handler, T_ILLOP)
TRAPHANDLER_NOEC(T_DEVICE_handler, T_DEVICE)
TRAPHANDLER(T_DBLFLT_handler, T_DBLFLT)
TRAPHANDLER(T_TSS_handler, T_TSS)
TRAPHANDLER(T_SEGNP_handler, T_SEGNP)
TRAPHANDLER(T_STACK_handler, T_STACK)
TRAPHANDLER(T_GPFLT_handler, T_GPFLT)
TRAPHANDLER(T_PGFLT_handler, T_PGFLT)
TRAPHANDLER_NOEC(T_FPERR_handler, T_FPERR)
TRAPHANDLER(T_ALIGN_handler, T_ALIGN)
TRAPHANDLER_NOEC(T_MCHK_handler, T_MCHK)
TRAPHANDLER_NOEC(T_SIMDERR_handler, T_SIMDERR)
TRAPHANDLER_NOEC(T_SYSCALL_handler, T_SYSCALL)具体是使用TRAPHANDLER还是TRAPHANDLER_NOEC可以对照xv6的vector.S文件。
然后是写_alltrap:
/*
 * Lab 3: Your code here for _alltraps
 */
  # vectors.S sends all traps here.
_alltraps:
  # Build trap frame.
  pushl %ds
  pushl %es
  pushal
  
  # Set up data segments.
  movw $GD_KD, %ax
  movw %ax, %ds
  movw %ax, %es
  # Call trap(tf), where tf=%esp
  pushl %esp
  call trap
  addl $4, %esp
  popal
  popl %es
  popl %ds
  addl $0x8, %esp  # trapno and errcode
  iret注意要对照着inc/trap.h中的Trapframe的定义来写,同时要参照xv6中的trapasm.S和x86.h(有trapframe的定义)来写。最后是trap_init()。因为在trapentry.S中只有函数名是全局变量,所以只能重复性的写很多...
void
trap_init(void)
{
	extern struct Segdesc gdt[];
	// LAB 3: Your code here.
	void T_DIVIDE_handler();
	void T_DEBUG_handler();
	void T_NMI_handler();
	void T_BRKPT_handler();
	void T_OFLOW_handler();
	void T_BOUND_handler();
	void T_ILLOP_handler();
	void T_DEVICE_handler();
	void T_DBLFLT_handler();
	void T_TSS_handler();
	void T_SEGNP_handler();
	void T_STACK_handler();
	void T_GPFLT_handler();
	void T_PGFLT_handler();
	void T_FPERR_handler();
	void T_ALIGN_handler();
	void T_MCHK_handler();
	void T_SIMDERR_handler();
	void T_SYSCALL_handler();
	SETGATE(idt[T_DIVIDE], 1, GD_KT, T_DIVIDE_handler, 0);
	SETGATE(idt[T_DEBUG], 1, GD_KT, T_DEBUG_handler, 0);
	SETGATE(idt[T_NMI], 1, GD_KT, T_NMI_handler, 0);
	SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 0);
	SETGATE(idt[T_OFLOW], 1, GD_KT, T_OFLOW_handler, 0);
	SETGATE(idt[T_BOUND], 1, GD_KT, T_BOUND_handler, 0);
	SETGATE(idt[T_ILLOP], 1, GD_KT, T_ILLOP_handler, 0);
	SETGATE(idt[T_DEVICE], 1, GD_KT, T_DEVICE_handler, 0);
	SETGATE(idt[T_DBLFLT], 1, GD_KT, T_DBLFLT_handler, 0);
	SETGATE(idt[T_TSS], 1, GD_KT, T_TSS_handler, 0);
	SETGATE(idt[T_SEGNP], 1, GD_KT, T_SEGNP_handler, 0);
	SETGATE(idt[T_STACK], 1, GD_KT, T_STACK_handler, 0);
	SETGATE(idt[T_GPFLT], 1, GD_KT, T_GPFLT_handler, 0);
	SETGATE(idt[T_PGFLT], 1, GD_KT, T_PGFLT_handler, 0);
	SETGATE(idt[T_FPERR], 1, GD_KT, T_FPERR_handler, 0);
	SETGATE(idt[T_ALIGN], 1, GD_KT, T_ALIGN_handler, 0);
	SETGATE(idt[T_MCHK], 1, GD_KT, T_MCHK_handler, 0);
	SETGATE(idt[T_SIMDERR], 1, GD_KT, T_SIMDERR_handler, 0);
	SETGATE(idt[T_SYSCALL], 0, GD_KT, T_SYSCALL_handler, 3);
	// Per-CPU setup 
	trap_init_percpu();
}然后运行make grade,就通过了Part A。注意这里的代码虽然可以通过lab3,但是到lab4会出问题...因为其istrap参数的问题,详情请见lab4。
回答两个问题:
SETGATE中的trapit了,也就是不能区分exception和interruption了,同时也不能给不同的中断设置不同的中断等级了。user/softint中的int $14会进入vector 13?14的privilege level是0,也就是user不能调用,在上面的代码中不是$30都会被识别为general protection fault,也就是中断13。处理其他的中断。
处理page fault,也就是14。当发生page fault的时候,处理器会把产生错误的地址存在CR2寄存器中。
在trap_dispatch()里面加入page_fault_handler()
static void
trap_dispatch(struct Trapframe *tf)
{
	// Handle processor exceptions.
	// LAB 3: Your code here.
	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		default:
			break;
	}
	// Unexpected trap: The user process or the kernel has a bug.
	print_trapframe(tf);
	if (tf->tf_cs == GD_KT)
		panic("unhandled trap in kernel");
	else {
		env_destroy(curenv);
		return;
	}
}对于断点中断,需要调用的是kern/monitor.c中的monitor函数,不过注意,因为breakpoint.c中是通过直接触法来进行测试的,所以需要把断点的等级调为3
SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 3);然后trap_dispatch为:
	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		case T_BRKPT:
			monitor(tf);
			return;
		default:
			break;
	}在JOS中,我们使用int $0x30来进行system call。应用会自己把system call需要的参数以及其编号川籍来,所以kernel就不需要去操作用户环境或者instruction stream了。system call number会在%eax, 参数(前5个)会在 %edx, %ecx, %ebx, %edi, 和 %esi。同样,kernel会把返回值存在%eax中。syscall函数在lb/syscall.c中。
static inline int32_t
syscall(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	int32_t ret;
	// Generic system call: pass system call number in AX,
	// up to five parameters in DX, CX, BX, DI, SI.
	// Interrupt kernel with T_SYSCALL.
	//
	// The "volatile" tells the assembler not to optimize
	// this instruction away just because we don't use the
	// return value.
	//
	// The last clause tells the assembler that this can
	// potentially change the condition codes and arbitrary
	// memory locations.
	asm volatile("int %1\n"
		     : "=a" (ret)
		     : "i" (T_SYSCALL),
		       "a" (num),
		       "d" (a1),
		       "c" (a2),
		       "b" (a3),
		       "D" (a4),
		       "S" (a5)
		     : "cc", "memory");
	if(check && ret > 0)
		panic("syscall %d returned %d (> 0)", num, ret);
	return ret;
}上面的这种写法叫gcc内联汇编,感兴趣的同学可以取查一下。
注意这里的和xv6的对比,明显JOS比xv6要简单很多,并没有通过用户的stack(esp)来掏出来参数,而且JOS也没有myproc这样一个全局状态。
加入system call的handler。由于我们已经加过了基本设置,所以只需要修改trap_dispatch()与kern/syscall.c中的syscall()了。
首先是trap_dispatch():
	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		case T_BRKPT:
			monitor(tf);
			return;
		case T_SYSCALL:
			tf->tf_regs.reg_eax = syscall(
				tf->tf_regs.reg_eax, tf->tf_regs.reg_edx,
				tf->tf_regs.reg_ecx, tf->tf_regs.reg_ebx,
				tf->tf_regs.reg_edi, tf->tf_regs.reg_esi
			);
			return;
		default:
			break;
	}注意别忘了用返回值更新eax。
其次是syscall():
// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	// Call the function corresponding to the 'syscallno' parameter.
	// Return any appropriate return value.
	// LAB 3: Your code here.
	// panic("syscall not implemented");
	switch (syscallno) {
		case SYS_cputs:
			sys_cputs((char *)a1, (size_t)a2);
			return;
		case SYS_cgetc:
			return sys_cgetc();
		case SYS_getenvid:
			return sys_getenvid();
		case SYS_env_destroy:
			return sys_env_destroy((envid_t)a1);
		default:
			return -E_INVAL;
	}
}用户应用会从lib/entry进入,然后调用lib/libmain.c中的libmain(),之后libmain会调用umain也就是进入了比如hello这样的函数中。我们希望能够在用户应用中使用thisenv也就是当前的环境状态。由于我们已经有了sys_getenvid()这样的函数,这个函数在lib/syscall.c中被声明,用来掉system call中的SYS_getenvid。有了envid之后,因为从inc/env.h中得知:
// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0|          Uniqueifier             |   Environment    |
// | |                                  |      Index       |
// +------------------------------------+------------------+
//                                       \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array.  The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors.  The envid_t == 0 is special, and
// stands for the current environment.
#define LOG2NENV		10
#define NENV			(1 << LOG2NENV)
#define ENVX(envid)		((envid) & (NENV - 1))我们只需要取后10位就可以得到当前环境在envs中的序号了,所以有:
thisenv = &envs[ENVX(sys_getenvid())];内存保护是操作系统非常重要的一部分,也是保证bug不能破坏其他程序或者kernel的一个重要手段。
操作系统通常通过硬件来实现内存保护。OS让硬件知道哪些虚拟地址是可以访问的,哪些不行。当一个程序试图访问非法地址的之后,处理器会trap。如果问题可以结局,那么kernel就会解决这个问题并让程序继续运行,如果不行,那么程序就不会继续运行。
一个常见的解决方法是自动扩充stack。一般默认就分配一个page作为用户的stack,如果触发了page fault,就自动再进行分配。
system call会导致一个很有趣的问题。很多system call允许用户传指针进kernel,这些指针会指向读写的buffer。这种做法有两个问题:
基于这两个原因,我们需要很谨慎的处理传进kernel的指针。
我们讲用一个机制来解决这两个问题。当程序向kernel传递指针的时候,kernel会检查该指针是不是在用户地址内,以及对应的page table允许内存操作。这样,kernel就不会因为dereference用户指针导致page fault了。
首先给trap中加上page fault在kernel mode,直接panic:
	switch(tf->tf_trapno) {
		case T_PGFLT:
			if ((tf->tf_cs & 0x3) == 0)
				panic("page fault in kernel");
			page_fault_handler(tf);
			return;之后补全kern/pmap.c中的user_mem_check:
// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range.  You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission.  These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
	// LAB 3: Your code here.
	uintptr_t v = ROUNDDOWN((uintptr_t)va, PGSIZE);
	uintptr_t end = ROUNDUP((uintptr_t)va + len, PGSIZE);
	for(;v < end; v += PGSIZE) {
		pte_t *pte = pgdir_walk(env->env_pgdir, (void *)v, 0);
		if (!pte || (*pte & perm) != perm) {
			if(v < (uintptr_t)va)
				user_mem_check_addr = (uintptr_t)va;
			else
				user_mem_check_addr = v;
			return -E_FAULT;
		}
	}
	return 0;
}注意需要返回的是这区间里的第一个地址,所以如果v比va小,返回的应该是va。
然后修改syscall.c中的sys_cputs以检查指针。
static void
sys_cputs(const char *s, size_t len)
{
	// Check that the user has permission to read memory [s, s+len).
	// Destroy the environment if not.
	// LAB 3: Your code here.
	user_mem_assert(curenv, s, len, PTE_U);
	// Print the string supplied by the user.
	cprintf("%.*s", len, s);
}之后,为了在breakpoint中实现backtrace功能,在kern/kdebug.c的debuginfo_eip()中加入如下代码:
	// Find the relevant set of stabs
	if (addr >= ULIM) {
		stabs = __STAB_BEGIN__;
		stab_end = __STAB_END__;
		stabstr = __STABSTR_BEGIN__;
		stabstr_end = __STABSTR_END__;
	} else {
		// The user-application linker script, user/user.ld,
		// puts information about the application's stabs (equivalent
		// to __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and
		// __STABSTR_END__) in a structure located at virtual address
		// USTABDATA.
		const struct UserStabData *usd = (const struct UserStabData *) USTABDATA;
		// Make sure this memory is valid.
		// Return -1 if it is not.  Hint: Call user_mem_check.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)usd, sizeof(struct UserStabData), PTE_U))
			return -1;
		stabs = usd->stabs;
		stab_end = usd->stab_end;
		stabstr = usd->stabstr;
		stabstr_end = usd->stabstr_end;
		// Make sure the STABS and string table memory is valid.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)stabs, stab_end - stabs, PTE_U))
			return -1;
		if(user_mem_check(curenv, (void *)stabstr, stabstr_end - stabstr, PTE_U))
			return -1;
	}之后运行make run-breakpoint-nox进入中断之后,如果运行bracktrack就会有如下结果:
K> backtrace
Stack backtrace:
  ebp efffff00  eip f0100ad7  args 00000001 efffff28 f01d2000 f0106781 f011af48
      kern/monitor.c:151: monitor+353
  ebp efffff80  eip f010429b  args f01d2000 efffffbc f0150508 00000092 f011afd8
      kern/trap.c:191: trap+282
  ebp efffffb0  eip f0104389  args efffffbc 00000000 00000000 eebfdfc0 efffffdc
      kern/trapentry.S:87: <unknown>+0
  ebp eebfdfc0  eip 00800087  args 00000000 00000000 eebfdff0 00800058 00000000
      lib/libmain.c:25: libmain+78
  ebp eebfdff0  eip 00800031  args 00000000 00000000Incoming TRAP frame at 0xeffffe64
kernel panic at kern/trap.c:187: page fault in kernel这里为什么没有搞懂。。。
当完成exercise 9的时候,exercise 10自动完成了。9和10的唯一区别就是传入的指针,一个是未分配的,另外一个是传入了对应kernel部分的地址,这两者都可以用上面的检查方法搞定。
最后来运行一下make grade:
divzero: OK (1.0s)
softint: OK (0.9s)
badsegment: OK (1.0s)
Part A score: 30/30
faultread: OK (1.0s)
faultreadkernel: OK (2.0s)
faultwrite: OK (1.1s)
faultwritekernel: OK (1.8s)
breakpoint: OK (1.1s)
testbss: OK (1.9s)
hello: OK (2.1s)
buggyhello: OK (2.0s)
buggyhello2: OK (2.2s)
evilhello: OK (1.8s)
Part B score: 50/50
Score: 80/80