6.828 lab3 User Environments

date: 2019-03-05
tags: OS 6.828

注意，在运行lab3之前，需要修改kern/kernel.ld文件中的bss部分为：

	.bss : {
		PROVIDE(edata = .);
		*(.dynbss)
		*(.bss .bss.*)
		*(COMMON)
		PROVIDE(end = .);
	}

非常感谢解决了这个问题的同学，解决的原文在这里。

Part A: User Environments and Exception Handling

首先我们需要看一下新的inc/env.h文件，其中包含了user environment的基本定义：

typedef int32_t envid_t;

// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0|          Uniqueifier             |   Environment    |
// | |                                  |      Index       |
// +------------------------------------+------------------+
//                                       \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array.  The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors.  The envid_t == 0 is special, and
// stands for the current environment.

#define LOG2NENV		10
#define NENV			(1 << LOG2NENV)
#define ENVX(envid)		((envid) & (NENV - 1))

// Values of env_status in struct Env
enum {
	ENV_FREE = 0,
	ENV_DYING,
	ENV_RUNNABLE,
	ENV_RUNNING,
	ENV_NOT_RUNNABLE
};

// Special environment types
enum EnvType {
	ENV_TYPE_USER = 0,
};

struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};

虽然这个lab只会去创建一个user environment，但是为了之后的lab，需要能够支持多个environment。

kern/env.c的前几行定义了kernel中的3个和环境相关的重要全局变量：

struct Env *envs = NULL;		// All environments
struct Env *curenv = NULL;		// The current env
static struct Env *env_free_list;	// Free environment list
					// (linked by Env->env_link)

当JOS启动的时候，envs会指向一个struct Env的数组表示系统中所有的环境。在JOS的设计中，最多有NENV（1024）个环境（一般远远达不到这个值）。这个数组中会存在一个能够保存这NENV个环境的数据结构。

就像page_free_list一样，JOS有一个env_free_list用来表示inactive Env，用来进行allocation, deallocation。curenv是当前正在执行的环境的指针，初始化为NULL。

Environment State

回到inc/env.h，我们来看一下Env

struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};

对于这些field的更详细的解释是：

env_tf: 当该环境不运行的时候保存寄存器，比如从user mode到kernel mode的转换过程。
env_link: 指向env_free_list里的下一个Env。
env_id: 保存一个uniquely identifier。注意如果一个环境被释放了，之后又有环境用了这个struct Env，他们的env_id会不同。
env_parent_id: 保存创建了这个环境的环境的env_id。从而可以建立一个树，从而方便一些security decision，也就是决定某个环境是否有某个权限。
env_type: 用来区分特殊环境的，普通的都是ENV_TYPE_USER。

env_status: 状态，具体取值如下：

// Values of env_status in struct Env
enum {
	ENV_FREE = 0,  // inactive, Env在env_free上
	ENV_DYING,  // zombie, environment, 下次trap到kernel的时候会被释放
	ENV_RUNNABLE,  // waiting to run
	ENV_RUNNING,  // currently running
	ENV_NOT_RUNNABLE  // currently active, but not ready to run，如等待IPC
};

env_pgdir: 该环境的page directory

同Unix process一样，JOS环境结合了thread与address space。thread用保存的寄存器确定（env_tf），address space用env_pgdir确定。kernel必须要设置好这两者以成功运行某个环境。

Allocating the Environments Array

修改mem_init以分配envs的地址。并把envs映射到kernel page directory的对应位置。

Exercise 1

	//////////////////////////////////////////////////////////////////////
	// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
	// LAB 3: Your code here.
	envs = (struct Env *)boot_alloc(NENV*sizeof(struct Env));
...
    //////////////////////////////////////////////////////////////////////
	// Map the 'envs' array read-only by the user at linear address UENVS
	// (ie. perm = PTE_U | PTE_P).
	// Permissions:
	//    - the new image at UENVS  -- kernel R, user R
	//    - envs itself -- kernel RW, user NONE
	// LAB 3: Your code here.
	boot_map_region(kern_pgdir, UENVS, PTSIZE, PADDR(envs), PTE_U | PTE_P);

注意后面的这个映射位置以及大小是参照的JOS的虚拟内存分布。写完之后运行kernel应该会出现好几个succeeded:

check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_free_list() succeeded!
check_page_installed_pgdir() succeeded!

Creating and Running Environments

因为现在还没有file system，所以要运行一个用户环境需要让kernel去加载一个静态的二进制image。这些影响都在obj/user/中，这些在kern/Makefrag中也有体现：

# Binary program images to embed within the kernel.
# Binary files for LAB3
KERN_BINFILES :=	user/hello \
			user/buggyhello \
			user/buggyhello2 \
			user/evilhello \
			user/testbss \
			user/divzero \
			user/breakpoint \
			user/softint \
			user/badsegment \
			user/faultread \
			user/faultreadkernel \
			user/faultwrite \
			user/faultwritekernel
...
# How to build the kernel itself
$(OBJDIR)/kern/kernel: $(KERN_OBJFILES) $(KERN_BINFILES) kern/kernel.ld \
	  $(OBJDIR)/.vars.KERN_LDFLAGS
	@echo + ld $@
	$(V)$(LD) -o $@ $(KERN_LDFLAGS) $(KERN_OBJFILES) $(GCC_LIB) -b binary $(KERN_BINFILES)
	$(V)$(OBJDUMP) -S $@ > $@.asm
	$(V)$(NM) -n $@ > $@.sym

这里的-b binary表示把文件当成raw unterpreted binary而不是编译器生成的.o文件。如果查看obj/kern/kernel.sym，可以看到一系列神奇的symbol

00008acc A _binary_obj_user_hello_size
00008ad0 A _binary_obj_user_badsegment_size
00008ad0 A _binary_obj_user_breakpoint_size
00008ad0 A _binary_obj_user_buggyhello_size
00008ad0 A _binary_obj_user_evilhello_size
00008ad0 A _binary_obj_user_faultread_size
00008ad0 A _binary_obj_user_faultwrite_size
00008ad0 A _binary_obj_user_softint_size
00008ad8 A _binary_obj_user_faultreadkernel_size
00008ad8 A _binary_obj_user_faultwritekernel_size
00008ae4 A _binary_obj_user_divzero_size
00008ae8 A _binary_obj_user_testbss_size
00008aec A _binary_obj_user_buggyhello2_size

linker生成了这些symbol以让kernel可以调用这些二进制文件。

在kern/init.c中，i386_init()函数会调用这些二进制文件中的一个（默认是user_hello）。但是这个函数里面和环境相关的部分都还没有完成，下面就是要填充上这些函数了。

Exercise 2

完成kern/env.c中的如下函数：

env_init()

初始化envs与env_free_list

// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
	// Set up envs array
	// LAB 3: Your code here.
	memset(envs, 0, sizeof(envs));
	env_free_list = NULL;
	for(int i=NENV-1; i>=0; i--) {
		envs[i].env_link = env_free_list;
		env_free_list = envs + i;
	}
	// Per-CPU part of the initialization
	env_init_percpu();
}

注意这里注释要求env_free_list顺序和envs的一样所以这里和page_init的顺序相反。（不明白为啥...）

env_setup_vm()

把初始化env_pgdir，并把其中的kernel部分的内存分配好。

// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error.  Errors include:
//	-E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
	int i;
	struct PageInfo *p = NULL;

	// Allocate a page for the page directory
	if (!(p = page_alloc(ALLOC_ZERO)))
		return -E_NO_MEM;

	// Now, set e->env_pgdir and initialize the page directory.
	//
	// Hint:
	//    - The VA space of all envs is identical above UTOP
	//	(except at UVPT, which we've set below).
	//	See inc/memlayout.h for permissions and layout.
	//	Can you use kern_pgdir as a template?  Hint: Yes.
	//	(Make sure you got the permissions right in Lab 2.)
	//    - The initial VA below UTOP is empty.
	//    - You do not need to make any more calls to page_alloc.
	//    - Note: In general, pp_ref is not maintained for
	//	physical pages mapped only above UTOP, but env_pgdir
	//	is an exception -- you need to increment env_pgdir's
	//	pp_ref for env_free to work correctly.
	//    - The functions in kern/pmap.h are handy.

	// LAB 3: Your code here.
	e->env_pgdir = page2kva(p);
	p->pp_ref++;
	memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
	// UVPT maps the env's own page table read-only.
	// Permissions: kernel R, user R
	e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;

	return 0;
}

注意这里用内存复制是因为boot_region_map是一个静态函数，不能调用。

region_alloc

在当前环境下在虚拟地址va处分配长为len的内存。

// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	// (But only if you need it for load_icode.)
	//
	// Hint: It is easier to use region_alloc if the caller can pass
	//   'va' and 'len' values that are not page-aligned.
	//   You should round va down, and round (va + len) up.
	//   (Watch out for corner-cases!)
	int r;
	void *v = ROUNDDOWN(va, PGSIZE);
	void* end = ROUNDUP(va + len, PGSIZE);
	struct PageInfo *p = NULL;
	for(; v < end; v += PGSIZE) {
		if((p = page_alloc(ALLOC_ZERO)) == NULL)
			panic("region_alloc: %e", -E_NO_MEM);
		if((r = page_insert(e->env_pgdir, p, v, PTE_U | PTE_W | PTE_P)) < 0)
			panic("region_alloc: %e", r);
	}
}

load_icode()

设置initial program binary, stack与processor flags。如注释所说，主要是模仿boot/main.c中的函数。注意需要切换pgdir，因为进入这个函数的时候是kernel mode，但是分配内存要在用户的地址空间分配。

// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk.  Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
//  - How might load_icode fail?  What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
	// Hints:
	//  Load each program segment into virtual memory
	//  at the address specified in the ELF segment header.
	//  You should only load segments with ph->p_type == ELF_PROG_LOAD.
	//  Each segment's virtual address can be found in ph->p_va
	//  and its size in memory can be found in ph->p_memsz.
	//  The ph->p_filesz bytes from the ELF binary, starting at
	//  'binary + ph->p_offset', should be copied to virtual address
	//  ph->p_va.  Any remaining memory bytes should be cleared to zero.
	//  (The ELF header should have ph->p_filesz <= ph->p_memsz.)
	//  Use functions from the previous lab to allocate and map pages.
	//
	//  All page protection bits should be user read/write for now.
	//  ELF segments are not necessarily page-aligned, but you can
	//  assume for this function that no two segments will touch
	//  the same virtual page.
	//
	//  You may find a function like region_alloc useful.
	//
	//  Loading the segments is much simpler if you can move data
	//  directly into the virtual addresses stored in the ELF binary.
	//  So which page directory should be in force during
	//  this function?
	//
	//  You must also do something with the program's entry point,
	//  to make sure that the environment starts executing there.
	//  What?  (See env_run() and env_pop_tf() below.)

	// LAB 3: Your code here.
	struct Elf *elfhdr = (struct Elf *)binary;
	if (elfhdr->e_magic != ELF_MAGIC)
		panic("load_icode: not valid elf file");
	struct Proghdr *ph, *eph;
	ph = (struct Proghdr *) (binary + elfhdr->e_phoff);
	eph = ph + elfhdr->e_phnum;
	lcr3(PADDR(e->env_pgdir));
	for (; ph < eph; ph++) {
		if(ph->p_type == ELF_PROG_LOAD) {
			region_alloc(e, (void *)ph->p_va, ph->p_memsz);
			memcpy((void *)(ph->p_va), (void *)(binary + ph->p_offset), ph->p_filesz);
		}
	}
	e->env_tf.tf_eip = elfhdr->e_entry;
	lcr3(PADDR(kern_pgdir));
	// Now map one page for the program's initial stack
	// at virtual address USTACKTOP - PGSIZE.
	// LAB 3: Your code here.
	region_alloc(e, (void *)(USTACKTOP - PGSIZE), PGSIZE);
}

env_create

就是结合上面的两个函数，先env_alloc再load_icode

// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	int r;
	struct Env *e = NULL;
	if((r = env_alloc(&e, 0)) < 0)
		panic("env_create: %e", r);
	load_icode(e, binary);
	e->env_type = type;
}

env_run()

按照注释的要求一步一步写就好了。注意别忘了最开始curenv可能是NULL。

// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
	// Step 1: If this is a context switch (a new environment is running):
	//	   1. Set the current environment (if any) back to
	//	      ENV_RUNNABLE if it is ENV_RUNNING (think about
	//	      what other states it can be in),
	//	   2. Set 'curenv' to the new environment,
	//	   3. Set its status to ENV_RUNNING,
	//	   4. Update its 'env_runs' counter,
	//	   5. Use lcr3() to switch to its address space.
	// Step 2: Use env_pop_tf() to restore the environment's
	//	   registers and drop into user mode in the
	//	   environment.

	// Hint: This function loads the new environment's state from
	//	e->env_tf.  Go back through the code you wrote above
	//	and make sure you have set the relevant parts of
	//	e->env_tf to sensible values.

	// LAB 3: Your code here.
	if(curenv && curenv->env_status == ENV_RUNNING)
		curenv->env_status = ENV_RUNNABLE;
	curenv = e;
	curenv->env_status = ENV_RUNNING;
	curenv->env_runs++;
	lcr3(PADDR(curenv->env_pgdir));
	env_pop_tf(&(curenv->env_tf));
	// panic("env_run not yet implemented");
}

完成了这几步之后运行当启动kernel的时候会进行如下操作：

start (kern/entry.S)：kernel的entry，也就是boot loader加载kernel的entry
i386_init(kern/init.c)：上面的entry调用了这个函数，对kernel进行初始化
- cons_init：初始化console
- mem_init：初始化kernel address space
- env_init：初始化所有的环境
- trap_init (still incomplete at this point)：初始化中断
- env_create：创建一个用户环境
- env_run：运行用户环境
  - env_pop_tf：从trapframe中还原这个用户环境所需要的寄存器状态。

完成了exercise 2之后因为并没有初始化中断，所以会在user_hello第一次进行system call的时候报triple fault的错。这是因为：When the CPU discovers that it is not set up to handle this system call interrupt, it will generate a general protection exception, find that it can't handle that, generate a double fault exception, find that it can't handle that either, and finally give up with what's known as a "triple fault".

我们可以使用gdb来检测是否进入了用户环境，在env_pop_tf中加断点之后逐步运行可以发现其会运行至地址为0x800020（可能会有出入）的指令，也就是进入了user mode。然后在int $0x30处加断点，之后再运行1步就会进入triple fault了。

Handling Interrupts and Exceptions

我们来完成中断部分。

Exercise 3

读书，在这里就不记录了。

Basics of Protected Control Transfer

Exeception和interrupt都是protected control transfer，其在用户代码不能干涉kernel的情况下，让处理器进入kernel mode。在intel的术语中，interrupt是由处理器外部的异步事件，如IO引起的，而exception是同步运行的代码引起的，如除0或者page fault。

之前提到过，为了能够做到protected，处理器的中断机制让用户只能进入几个固定的kernel位置。在x86中，由2种机制可以确保这种protection。

The Interrupt Descriptor Table （IDT）

一个在kernel private memory中的表，记录了0~255这256种不同的中断的EIP和CS，前者是中断进入的kernel code的位置，后者是中断的privilege level（在JOS中都是0，也就是kernel mode）。
The Task State Segment （TSS）

用于存放中断前的old processor state，用于中断之后还原状态。注意这部分也是存储在kernel stack中的。

尽管TSS很大，可以有很多功用，JOS仅仅记录中断转移到的kernel stack。处理器用ESP0和SS0来定义kernel mode，且JOS中不使用TSS的其他field。

Types of Exceptions and Interrupts

大于31的中断为software interrupt或hardware interrupt，前者是可以用int指令进入，后者是外部硬件发出的。

在这节里，我们会拓展JOS使其可以处理它自己产生的0~31中断。之后一节我们会处理48(0x30)，也就是system call，注意这个48是随机选的。lab4里面我们会处理硬件中断。

An Example

例如，代码中出现了除0，那么：

processor会通过TSS中的ESP0和SS0来切换到kernel stack。在JOS中，这两个值分别是GD_KD与KSTACKTOP。

处理器会把exception parameter推进kernel stack，其地址始于KSTACKTOP。

+--------------------+ KSTACKTOP             
| 0x00000 | old SS   |     " - 4
|      old ESP       |     " - 8
|     old EFLAGS     |     " - 12
| 0x00000 | old CS   |     " - 16
|      old EIP       |     " - 20 <---- ESP 
+--------------------+

对于除0这种情况，在x86中对应的是vector 0，处理器会读取IDT中entry 0，并设置对应的CS:IP。
最后会运行这个exception对应的handler，例如结束程序。

对于一些特殊的exception，除了会推入上述的5个words，处理器还会退入error code。可以阅读80386的manual来查看不同的error code意味着什么。

+--------------------+ KSTACKTOP             
| 0x00000 | old SS   |     " - 4
|      old ESP       |     " - 8
|     old EFLAGS     |     " - 12
| 0x00000 | old CS   |     " - 16
|      old EIP       |     " - 20
|     error code     |     " - 24 <---- ESP
+--------------------+

Nested Exceptions and Interrupts

中断即可以在user mode中产生，也可以从kernel mode中产生。但是x86处理器只会在从user到kernel的过程中自动保存old register state。如果发生中断时已经在kernel里了，CPU只会继续向同样的kernel stack中推入值，从而使kernel可以处理嵌套的中断。

具体来说，因为不需要换栈，所以就不需要保存SS与ESP，所以handler眼中的第二个中断对应的stack就会是这样：

+--------------------+ <---- old ESP
|     old EFLAGS     |     " - 4
| 0x00000 | old CS   |     " - 8
|      old EIP       |     " - 12
+--------------------+

Setting Up the IDT

我们来设置0~31的IDT。我们需要用到的一些定义在inc/trap.h与kern/trap.h中。

注意，0~31中的有一些中断已经被intel保留了，所以处理器永远都不会产生这些中断，怎么处理都行。

整个的控制方式应该如下：

      IDT                   trapentry.S         trap.c
   
+----------------+                        
|   &handler1    |---------> handler1:          trap (struct Trapframe *tf)
|                |             // do stuff      {
|                |             call trap          // handle the exception/interrupt
|                |             // ...           }
+----------------+
|   &handler2    |--------> handler2:
|                |            // do stuff
|                |            call trap
|                |            // ...
+----------------+
       .
       .
       .
+----------------+
|   &handlerX    |--------> handlerX:
|                |             // do stuff
|                |             call trap
|                |             // ...
+----------------+

每个中断都应该在trapentry.S和trap_init()中有其对应的地址。

Exercise 4

这部分我主要是通过和xv6的对应部分对照着写的。

首先写trapentry.S，这个文件分为两部分，第一是写handler:

/*
 * Lab 3: Your code here for generating entry points for the different traps.
 */
TRAPHANDLER_NOEC(T_DIVIDE_handler, T_DIVIDE)
TRAPHANDLER_NOEC(T_DEBUG_handler, T_DEBUG)
TRAPHANDLER_NOEC(T_NMI_handler, T_NMI)
TRAPHANDLER_NOEC(T_BRKPT_handler, T_BRKPT)
TRAPHANDLER_NOEC(T_OFLOW_handler, T_OFLOW)
TRAPHANDLER_NOEC(T_BOUND_handler, T_BOUND)
TRAPHANDLER_NOEC(T_ILLOP_handler, T_ILLOP)
TRAPHANDLER_NOEC(T_DEVICE_handler, T_DEVICE)
TRAPHANDLER(T_DBLFLT_handler, T_DBLFLT)
TRAPHANDLER(T_TSS_handler, T_TSS)
TRAPHANDLER(T_SEGNP_handler, T_SEGNP)
TRAPHANDLER(T_STACK_handler, T_STACK)
TRAPHANDLER(T_GPFLT_handler, T_GPFLT)
TRAPHANDLER(T_PGFLT_handler, T_PGFLT)
TRAPHANDLER_NOEC(T_FPERR_handler, T_FPERR)
TRAPHANDLER(T_ALIGN_handler, T_ALIGN)
TRAPHANDLER_NOEC(T_MCHK_handler, T_MCHK)
TRAPHANDLER_NOEC(T_SIMDERR_handler, T_SIMDERR)
TRAPHANDLER_NOEC(T_SYSCALL_handler, T_SYSCALL)

具体是使用TRAPHANDLER还是TRAPHANDLER_NOEC可以对照xv6的vector.S文件。

然后是写_alltrap:

/*
 * Lab 3: Your code here for _alltraps
 */
  # vectors.S sends all traps here.
_alltraps:
  # Build trap frame.
  pushl %ds
  pushl %es
  pushal
  
  # Set up data segments.
  movw $GD_KD, %ax
  movw %ax, %ds
  movw %ax, %es

  # Call trap(tf), where tf=%esp
  pushl %esp
  call trap
  addl $4, %esp

  popal
  popl %es
  popl %ds
  addl $0x8, %esp  # trapno and errcode
  iret

注意要对照着inc/trap.h中的Trapframe的定义来写，同时要参照xv6中的trapasm.S和x86.h(有trapframe的定义)来写。最后是trap_init()。因为在trapentry.S中只有函数名是全局变量，所以只能重复性的写很多...

void
trap_init(void)
{
	extern struct Segdesc gdt[];

	// LAB 3: Your code here.
	void T_DIVIDE_handler();
	void T_DEBUG_handler();
	void T_NMI_handler();
	void T_BRKPT_handler();
	void T_OFLOW_handler();
	void T_BOUND_handler();
	void T_ILLOP_handler();
	void T_DEVICE_handler();
	void T_DBLFLT_handler();
	void T_TSS_handler();
	void T_SEGNP_handler();
	void T_STACK_handler();
	void T_GPFLT_handler();
	void T_PGFLT_handler();
	void T_FPERR_handler();
	void T_ALIGN_handler();
	void T_MCHK_handler();
	void T_SIMDERR_handler();
	void T_SYSCALL_handler();
	SETGATE(idt[T_DIVIDE], 1, GD_KT, T_DIVIDE_handler, 0);
	SETGATE(idt[T_DEBUG], 1, GD_KT, T_DEBUG_handler, 0);
	SETGATE(idt[T_NMI], 1, GD_KT, T_NMI_handler, 0);
	SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 0);
	SETGATE(idt[T_OFLOW], 1, GD_KT, T_OFLOW_handler, 0);
	SETGATE(idt[T_BOUND], 1, GD_KT, T_BOUND_handler, 0);
	SETGATE(idt[T_ILLOP], 1, GD_KT, T_ILLOP_handler, 0);
	SETGATE(idt[T_DEVICE], 1, GD_KT, T_DEVICE_handler, 0);
	SETGATE(idt[T_DBLFLT], 1, GD_KT, T_DBLFLT_handler, 0);
	SETGATE(idt[T_TSS], 1, GD_KT, T_TSS_handler, 0);
	SETGATE(idt[T_SEGNP], 1, GD_KT, T_SEGNP_handler, 0);
	SETGATE(idt[T_STACK], 1, GD_KT, T_STACK_handler, 0);
	SETGATE(idt[T_GPFLT], 1, GD_KT, T_GPFLT_handler, 0);
	SETGATE(idt[T_PGFLT], 1, GD_KT, T_PGFLT_handler, 0);
	SETGATE(idt[T_FPERR], 1, GD_KT, T_FPERR_handler, 0);
	SETGATE(idt[T_ALIGN], 1, GD_KT, T_ALIGN_handler, 0);
	SETGATE(idt[T_MCHK], 1, GD_KT, T_MCHK_handler, 0);
	SETGATE(idt[T_SIMDERR], 1, GD_KT, T_SIMDERR_handler, 0);
	SETGATE(idt[T_SYSCALL], 0, GD_KT, T_SYSCALL_handler, 3);
	// Per-CPU setup 
	trap_init_percpu();
}

然后运行make grade，就通过了Part A。注意这里的代码虽然可以通过lab3，但是到lab4会出问题...因为其istrap参数的问题，详情请见lab4。

回答两个问题：

为什么要每个中断一个handler？那样就不能分开设置SETGATE中的trapit了，也就是不能区分exception和interruption了，同时也不能给不同的中断设置不同的中断等级了。
为什么user/softint中的int $14会进入vector 13？14的privilege level是0，也就是user不能调用，在上面的代码中不是$30都会被识别为general protection fault，也就是中断13。

Part B: Page Faults, Breakpoints Exceptions, and System Calls

处理其他的中断。

Handling Page Faults

处理page fault，也就是14。当发生page fault的时候，处理器会把产生错误的地址存在CR2寄存器中。

Exercise 5

在trap_dispatch()里面加入page_fault_handler()

static void
trap_dispatch(struct Trapframe *tf)
{
	// Handle processor exceptions.
	// LAB 3: Your code here.
	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		default:
			break;
	}
	// Unexpected trap: The user process or the kernel has a bug.
	print_trapframe(tf);
	if (tf->tf_cs == GD_KT)
		panic("unhandled trap in kernel");
	else {
		env_destroy(curenv);
		return;
	}
}

The Breakpoint Exception

Exercise 6

对于断点中断，需要调用的是kern/monitor.c中的monitor函数，不过注意，因为breakpoint.c中是通过直接触法来进行测试的，所以需要把断点的等级调为3

SETGATE(idt[T_BRKPT], 1, GD_KT, T_BRKPT_handler, 3);

然后trap_dispatch为：

	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		case T_BRKPT:
			monitor(tf);
			return;
		default:
			break;
	}

System calls

在JOS中，我们使用int $0x30来进行system call。应用会自己把system call需要的参数以及其编号川籍来，所以kernel就不需要去操作用户环境或者instruction stream了。system call number会在%eax，参数（前5个）会在 %edx, %ecx, %ebx, %edi, 和 %esi。同样，kernel会把返回值存在%eax中。syscall函数在lb/syscall.c中。

static inline int32_t
syscall(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	int32_t ret;

	// Generic system call: pass system call number in AX,
	// up to five parameters in DX, CX, BX, DI, SI.
	// Interrupt kernel with T_SYSCALL.
	//
	// The "volatile" tells the assembler not to optimize
	// this instruction away just because we don't use the
	// return value.
	//
	// The last clause tells the assembler that this can
	// potentially change the condition codes and arbitrary
	// memory locations.

	asm volatile("int %1\n"
		     : "=a" (ret)
		     : "i" (T_SYSCALL),
		       "a" (num),
		       "d" (a1),
		       "c" (a2),
		       "b" (a3),
		       "D" (a4),
		       "S" (a5)
		     : "cc", "memory");

	if(check && ret > 0)
		panic("syscall %d returned %d (> 0)", num, ret);

	return ret;
}

上面的这种写法叫gcc内联汇编，感兴趣的同学可以取查一下。

注意这里的和xv6的对比，明显JOS比xv6要简单很多，并没有通过用户的stack(esp)来掏出来参数，而且JOS也没有myproc这样一个全局状态。

Exercise 7

加入system call的handler。由于我们已经加过了基本设置，所以只需要修改trap_dispatch()与kern/syscall.c中的syscall()了。

首先是trap_dispatch():

	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		case T_BRKPT:
			monitor(tf);
			return;
		case T_SYSCALL:
			tf->tf_regs.reg_eax = syscall(
				tf->tf_regs.reg_eax, tf->tf_regs.reg_edx,
				tf->tf_regs.reg_ecx, tf->tf_regs.reg_ebx,
				tf->tf_regs.reg_edi, tf->tf_regs.reg_esi
			);
			return;
		default:
			break;
	}

注意别忘了用返回值更新eax。

其次是syscall():

// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	// Call the function corresponding to the 'syscallno' parameter.
	// Return any appropriate return value.
	// LAB 3: Your code here.

	// panic("syscall not implemented");

	switch (syscallno) {
		case SYS_cputs:
			sys_cputs((char *)a1, (size_t)a2);
			return;
		case SYS_cgetc:
			return sys_cgetc();
		case SYS_getenvid:
			return sys_getenvid();
		case SYS_env_destroy:
			return sys_env_destroy((envid_t)a1);
		default:
			return -E_INVAL;
	}
}

User-mode startup

用户应用会从lib/entry进入，然后调用lib/libmain.c中的libmain()，之后libmain会调用umain也就是进入了比如hello这样的函数中。我们希望能够在用户应用中使用thisenv也就是当前的环境状态。由于我们已经有了sys_getenvid()这样的函数，这个函数在lib/syscall.c中被声明，用来掉system call中的SYS_getenvid。有了envid之后，因为从inc/env.h中得知：

// An environment ID 'envid_t' has three parts:
//
// +1+---------------21-----------------+--------10--------+
// |0|          Uniqueifier             |   Environment    |
// | |                                  |      Index       |
// +------------------------------------+------------------+
//                                       \--- ENVX(eid) --/
//
// The environment index ENVX(eid) equals the environment's index in the
// 'envs[]' array.  The uniqueifier distinguishes environments that were
// created at different times, but share the same environment index.
//
// All real environments are greater than 0 (so the sign bit is zero).
// envid_ts less than 0 signify errors.  The envid_t == 0 is special, and
// stands for the current environment.

#define LOG2NENV		10
#define NENV			(1 << LOG2NENV)
#define ENVX(envid)		((envid) & (NENV - 1))

我们只需要取后10位就可以得到当前环境在envs中的序号了，所以有：

thisenv = &envs[ENVX(sys_getenvid())];

Page faults and memory protection

内存保护是操作系统非常重要的一部分，也是保证bug不能破坏其他程序或者kernel的一个重要手段。

操作系统通常通过硬件来实现内存保护。OS让硬件知道哪些虚拟地址是可以访问的，哪些不行。当一个程序试图访问非法地址的之后，处理器会trap。如果问题可以结局，那么kernel就会解决这个问题并让程序继续运行，如果不行，那么程序就不会继续运行。

一个常见的解决方法是自动扩充stack。一般默认就分配一个page作为用户的stack，如果触发了page fault，就自动再进行分配。

system call会导致一个很有趣的问题。很多system call允许用户传指针进kernel，这些指针会指向读写的buffer。这种做法有两个问题：

kernel中的page fault会比user program中的严重许多。如果kernel中的page fault不能解决，那么就会panic整个系统。但是事实上，在上面谈到的问题里，那些buffer带来的page fault是user program的，而不是kernel的。
kernel往往有更强的权限，上面的这个system call可能会泄露一些kernel的private memory。

基于这两个原因，我们需要很谨慎的处理传进kernel的指针。

我们讲用一个机制来解决这两个问题。当程序向kernel传递指针的时候，kernel会检查该指针是不是在用户地址内，以及对应的page table允许内存操作。这样，kernel就不会因为dereference用户指针导致page fault了。

Exercise 9

首先给trap中加上page fault在kernel mode，直接panic:

	switch(tf->tf_trapno) {
		case T_PGFLT:
			if ((tf->tf_cs & 0x3) == 0)
				panic("page fault in kernel");
			page_fault_handler(tf);
			return;

之后补全kern/pmap.c中的user_mem_check：

// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range.  You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission.  These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
	// LAB 3: Your code here.
	uintptr_t v = ROUNDDOWN((uintptr_t)va, PGSIZE);
	uintptr_t end = ROUNDUP((uintptr_t)va + len, PGSIZE);
	for(;v < end; v += PGSIZE) {
		pte_t *pte = pgdir_walk(env->env_pgdir, (void *)v, 0);
		if (!pte || (*pte & perm) != perm) {
			if(v < (uintptr_t)va)
				user_mem_check_addr = (uintptr_t)va;
			else
				user_mem_check_addr = v;
			return -E_FAULT;
		}
	}
	return 0;
}

注意需要返回的是这区间里的第一个地址，所以如果v比va小，返回的应该是va。

然后修改syscall.c中的sys_cputs以检查指针。

static void
sys_cputs(const char *s, size_t len)
{
	// Check that the user has permission to read memory [s, s+len).
	// Destroy the environment if not.

	// LAB 3: Your code here.
	user_mem_assert(curenv, s, len, PTE_U);
	// Print the string supplied by the user.
	cprintf("%.*s", len, s);
}

之后，为了在breakpoint中实现backtrace功能，在kern/kdebug.c的debuginfo_eip()中加入如下代码：

	// Find the relevant set of stabs
	if (addr >= ULIM) {
		stabs = __STAB_BEGIN__;
		stab_end = __STAB_END__;
		stabstr = __STABSTR_BEGIN__;
		stabstr_end = __STABSTR_END__;
	} else {
		// The user-application linker script, user/user.ld,
		// puts information about the application's stabs (equivalent
		// to __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and
		// __STABSTR_END__) in a structure located at virtual address
		// USTABDATA.
		const struct UserStabData *usd = (const struct UserStabData *) USTABDATA;

		// Make sure this memory is valid.
		// Return -1 if it is not.  Hint: Call user_mem_check.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)usd, sizeof(struct UserStabData), PTE_U))
			return -1;

		stabs = usd->stabs;
		stab_end = usd->stab_end;
		stabstr = usd->stabstr;
		stabstr_end = usd->stabstr_end;

		// Make sure the STABS and string table memory is valid.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)stabs, stab_end - stabs, PTE_U))
			return -1;
		if(user_mem_check(curenv, (void *)stabstr, stabstr_end - stabstr, PTE_U))
			return -1;
	}

之后运行make run-breakpoint-nox进入中断之后，如果运行bracktrack就会有如下结果：

K> backtrace
Stack backtrace:
  ebp efffff00  eip f0100ad7  args 00000001 efffff28 f01d2000 f0106781 f011af48
      kern/monitor.c:151: monitor+353
  ebp efffff80  eip f010429b  args f01d2000 efffffbc f0150508 00000092 f011afd8
      kern/trap.c:191: trap+282
  ebp efffffb0  eip f0104389  args efffffbc 00000000 00000000 eebfdfc0 efffffdc
      kern/trapentry.S:87: <unknown>+0
  ebp eebfdfc0  eip 00800087  args 00000000 00000000 eebfdff0 00800058 00000000
      lib/libmain.c:25: libmain+78
  ebp eebfdff0  eip 00800031  args 00000000 00000000Incoming TRAP frame at 0xeffffe64
kernel panic at kern/trap.c:187: page fault in kernel

这里为什么没有搞懂。。。

Exercise 10

当完成exercise 9的时候，exercise 10自动完成了。9和10的唯一区别就是传入的指针，一个是未分配的，另外一个是传入了对应kernel部分的地址，这两者都可以用上面的检查方法搞定。

最后来运行一下make grade：

divzero: OK (1.0s)
softint: OK (0.9s)
badsegment: OK (1.0s)
Part A score: 30/30

faultread: OK (1.0s)
faultreadkernel: OK (2.0s)
faultwrite: OK (1.1s)
faultwritekernel: OK (1.8s)
breakpoint: OK (1.1s)
testbss: OK (1.9s)
hello: OK (2.1s)
buggyhello: OK (2.0s)
buggyhello2: OK (2.2s)
evilhello: OK (1.8s)
Part B score: 50/50

Score: 80/80

zhuzilin's Blog

about