zhuzilin's Blog

about

6.828 笔记9

date: 2019-03-06
tags: OS 6.828

这里会记录阅读6.828课程lecture note的我的个人笔记。可能会中英混杂，不是很适合外人阅读，也请见谅。

Lecture 10: Processes, threads, and scheduling

之前的一次作业基本上都是阅读代码和回答问题，所以就不单独列出来了。本讲的内容非常建议按照要求先阅读xv6-book的对应部分。

Process scheduling

什么是进程：

进程是an abstract virtual machine，仿佛其有自己的CPU和内存，并不受其他进程影响。主要是为了isolation。

进程的主要API有：

  fork
  exec
  exit
  wait
  kill
  sbrk
  getpid

我们的挑战是很多时候，进程数比内核数多。这个时候我们就要用名叫time-sharing（分时）的方法，伴随以scheduling和context switch。

我们的主要目标是：

transparent to user processes (kernel对于用户应用不可见)
pre-emptive for user processes
pre-emptive for kernel (帮助系统作响应)

xv6的解决方案是每个进程1个 user thread，1个kernel thread，每个处理器1个scheduler thread

什么是线程：

a CPU core executing (with register and stack)
a saved set of registers and a stack that could execute

xv6的进程切换的概况：

context switch

user -> kernel thread (via system call or timer)
kernel thread yields, due to pre-emption or waiting for I/O
kernel thread -> scheduler thread
scheduler thread finds a RUNNABLE kernel thread
scheduler thread -> kernel thread
kernel thread -> user

每个xv6 process都有一个状态proc->state，可以的取值为：

  RUNNING
  RUNNABLE
  SLEEPING
  ZOMBIE
  UNUSED

注意：

xv6有多个kernel thread，他们共享同一个kernel address space
xv6的每个进程只有1个user thread
像Linux这样的系统支持没个进程多个线程。

context switching是xv6里最难做对的事了。

xv6 code for context switch

下面让我们来看看xv6的代码来学习一下它是怎么进行context switch的：

进行两次context switch是为了简化cleaning up。

我们没有讲义中提到的hog.c，所以没办法用gdb进行调试，不过还是可以跟着调试的路子看代码。

当开始发生context switch的时候，会先通过时间中断触发trap()中的这部分：

  // Force process to give up CPU on clock tick.
  // If interrupts were on while locks held, would need to check nlock.
  if(myproc() && myproc()->state == RUNNING &&
     tf->trapno == T_IRQ0+IRQ_TIMER)
    yield();

然后进入位于proc.c的yield：

// Give up the CPU for one scheduling round.
void
yield(void)
{
  acquire(&ptable.lock);  //DOC: yieldlock
  myproc()->state = RUNNABLE;
  sched();
  release(&ptable.lock);
}

让当前进程等待之后，进入在同一个文件中的shed()

// Enter scheduler.  Must hold only ptable.lock
// and have changed proc->state. Saves and restores
// intena because intena is a property of this
// kernel thread, not this CPU. It should
// be proc->intena and proc->ncli, but that would
// break in the few places where a lock is held but
// there's no process.
void
sched(void)
{
  int intena;
  struct proc *p = myproc();

  if(!holding(&ptable.lock))
    panic("sched ptable.lock");
  if(mycpu()->ncli != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched running");
  if(readeflags()&FL_IF)
    panic("sched interruptible");
  intena = mycpu()->intena;
  swtch(&p->context, mycpu()->scheduler);
  mycpu()->intena = intena;
}

shed就是检查一下当前的状态是不是对的。注意这里因为acquire了ptable.lock，由于spinlock的特点（见spinlock.c/aquire()），cpu的中断应该是处于disabled状态，也就是说第3个判断是由上面两个推测出来的。然后转到swtch()（步骤2）。在swtch.S中：

# Context switch
#
#   void swtch(struct context **old, struct context *new);
# 
# Save the current registers on the stack, creating
# a struct context, and save its address in *old.
# Switch stacks to new and pop previously-saved registers.

.globl swtch
swtch:
  movl 4(%esp), %eax  # &p->context
  movl 8(%esp), %edx  # mycpu()->scheduler

  # Save old callee-saved registers
  pushl %ebp
  pushl %ebx
  pushl %esi
  pushl %edi

  # Switch stacks
  movl %esp, (%eax)
  movl %edx, %esp

  # Load new callee-saved registers
  popl %edi
  popl %esi
  popl %ebx
  popl %ebp
  ret

swtch不清楚thread的信息，其只是保存切换前的一些寄存器，切换到了mycpu()->scheduler这个处理器的scheduler thread里（通过切换%esp），然后把切换后的环境的寄存器恢复回来，然后return。这些保存和恢复的寄存器就是context。

切换栈使得swtch返回会返回到CPU的scheduler()中了。这个函数在proc.c中

// Per-CPU process scheduler.
// Each CPU calls scheduler() after setting itself up.
// Scheduler never returns.  It loops, doing:
//  - choose a process to run
//  - swtch to start running that process
//  - eventually that process transfers control
//      via swtch back to the scheduler.
void
scheduler(void)
{
  struct proc *p;
  struct cpu *c = mycpu();
  c->proc = 0;
  
  for(;;){
    // Enable interrupts on this processor.
    sti();

    // Loop over process table looking for process to run.
    acquire(&ptable.lock);
    for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
      if(p->state != RUNNABLE)
        continue;

      // Switch to chosen process.  It is the process's job
      // to release ptable.lock and then reacquire it
      // before jumping back to us.
      c->proc = p;
      switchuvm(p);
      p->state = RUNNING;

      swtch(&(c->scheduler), p->context);
      switchkvm();

      // Process is done running for now.
      // It should have changed its p->state before coming back.
      c->proc = 0;
    }
    release(&ptable.lock);

  }
}

注意会继续从scheduler的swtch这行继续运行，因为之前就是从这里切换走的。注意前面的ptable.lock和yield里是一个锁。然后就是运行switchkvm()

// Switch h/w page table register to the kernel-only page table,
// for when no process is running.
void
switchkvm(void)
{
  lcr3(V2P(kpgdir));   // switch to the kernel page table
}

通过switchkvm来释放了旧的page table。然后scheduler回继续运行，去找下一个RUNNABLE的进程，如果没有，就释放ptable.lock，看看其他的处理器有没有需要。

如果有可以切换的（有RUNNABLE），就运行switchuvm

// Switch TSS and h/w page table to correspond to process p.
void
switchuvm(struct proc *p)
{
  if(p == 0)
    panic("switchuvm: no process");
  if(p->kstack == 0)
    panic("switchuvm: no kstack");
  if(p->pgdir == 0)
    panic("switchuvm: no pgdir");

  pushcli();
  mycpu()->gdt[SEG_TSS] = SEG16(STS_T32A, &mycpu()->ts,
                                sizeof(mycpu()->ts)-1, 0);
  mycpu()->gdt[SEG_TSS].s = 0;
  mycpu()->ts.ss0 = SEG_KDATA << 3;
  mycpu()->ts.esp0 = (uint)p->kstack + KSTACKSIZE;
  // setting IOPL=0 in eflags *and* iomb beyond the tss segment limit
  // forbids I/O instructions (e.g., inb and outb) from user space
  mycpu()->ts.iomb = (ushort) 0xFFFF;
  ltr(SEG_TSS << 3);
  lcr3(V2P(p->pgdir));  // switch to process's address space
  popcli();
}

把TSS和page table都换成这个进程的。之后就再调用swtch切换到这个进程了。注意从这个swtch返回的时候，会返回到sched的最下面，因为之前的切换就发生于此，然后依次返回yield, trap...从而继续运行这个进程。

注意，在yield中的acquire会被另外一个进程里的yield的release去释放。

下面是关于这整个步骤的几个问题：

scheduling policy是什么：

因为是循环运行，所以是Round Robin。除非只有两个进程在同时运行，刚刚yield的程序不会被马上运行。
为什么scheduler会在每个循环后会release，循环前会acquire。

为了让其他的processor可以使用ptable。不然如果两个处理器只有1个进程，会导致死锁。原因如下：

假设有cpu A和cpu B，进程p运行在cpu A上。如果cpu B acquire了锁，但不release，会导致p在yield的时候会无限等待。
为什么在scheduler里面重启中断？

因为可能没有RUNNABLE进程，如果不重启中断就会一直死循环在scheduler里面了。enable interrupt可以让一些在等待I/O的进程能够signal completion
为什么是yield获取了ptable.lock但是scheduler()来释放。

注意这是非常不寻常的地方，aquire和release不是在同一个线程里面做的

为什么swtch需要hold lock呢？

不然可能会有两个processor都换成了同一个进程
ptable.lock保护的是如下的几个invariant
- RUNNING过程中，处理器的寄存器存值（而不是在context中）
- RUNNABLE过程中，context保存寄存器。且没有处理器使用这个进程的stack
从yield到scheduler都hold lock保证了关闭中断，所以没有timer会影响swtch的save&restore。另一个CPU在过程中不能进行stack switch
kernel thread进行不进行pre-emptive scheduling (就是上面说的这个过程)

从trap的条件可以看出来，time interrupt不区分kernel thread还是user-level thread。

Thread clean up

这里主要看kill, exit, wait这3个进程的API。

首先是kill

// Kill the process with the given pid.
// Process won't exit until it returns
// to user space (see trap in trap.c).
int
kill(int pid)
{
  struct proc *p;

  acquire(&ptable.lock);
  for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
    if(p->pid == pid){
      p->killed = 1;
      // Wake process from sleep if necessary.
      if(p->state == SLEEPING)
        p->state = RUNNABLE;
      release(&ptable.lock);
      return 0;
    }
  }
  release(&ptable.lock);
  return -1;
}

用这个函数去释放内存什么的太麻烦了，所以是指把p->kill设置为1。然后在trap里面

  if(tf->trapno == T_SYSCALL){
    if(myproc()->killed)
      exit();
    myproc()->tf = tf;
    syscall();
    if(myproc()->killed)
      exit();
    return;
  }

从而转化为进程自己调用exit()，自己来退出。

那么之后就是exit()函数：

// Exit the current process.  Does not return.
// An exited process remains in the zombie state
// until its parent calls wait() to find out it exited.
void
exit(void)
{
  struct proc *curproc = myproc();
  struct proc *p;
  int fd;

  if(curproc == initproc)
    panic("init exiting");

  // Close all open files.
  for(fd = 0; fd < NOFILE; fd++){
    if(curproc->ofile[fd]){
      fileclose(curproc->ofile[fd]);
      curproc->ofile[fd] = 0;
    }
  }

  begin_op();
  iput(curproc->cwd);
  end_op();
  curproc->cwd = 0;

  acquire(&ptable.lock);

  // Parent might be sleeping in wait().
  wakeup1(curproc->parent);

  // Pass abandoned children to init.
  for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
    if(p->parent == curproc){
      p->parent = initproc;
      if(p->state == ZOMBIE)
        wakeup1(initproc);
    }
  }

  // Jump into the scheduler, never to return.
  curproc->state = ZOMBIE;
  sched();
  panic("zombie exit");
}

一个进程自己是不能释放自己的stack的。只能把自己设置为ZOMBIE然后切出去让parent进程来清理。

wait就是用来进行这个最后的处理的：

// Wait for a child process to exit and return its pid.
// Return -1 if this process has no children.
int
wait(void)
{
  struct proc *p;
  int havekids, pid;
  struct proc *curproc = myproc();
  
  acquire(&ptable.lock);
  for(;;){
    // Scan through table looking for exited children.
    havekids = 0;
    for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
      if(p->parent != curproc)
        continue;
      havekids = 1;
      if(p->state == ZOMBIE){
        // Found one.
        pid = p->pid;
        kfree(p->kstack);
        p->kstack = 0;
        freevm(p->pgdir);
        p->pid = 0;
        p->parent = 0;
        p->name[0] = 0;
        p->killed = 0;
        p->state = UNUSED;
        release(&ptable.lock);
        return pid;
      }
    }

    // No point waiting if we don't have any children.
    if(!havekids || curproc->killed){
      release(&ptable.lock);
      return -1;
    }

    // Wait for children to exit.  (See wakeup1 call in proc_exit.)
    sleep(curproc, &ptable.lock);  //DOC: wait-sleep
  }
}

wait就是一个循环，如果有子进程变成ZOMBIE了，那么就清理其内存，并返回这个子进程的pid。如果暂时没有就会进入sleep，下一讲会讲。