6.828 笔记8

date: 2019-03-02
tags: OS 6.828

这里会记录阅读6.828课程lecture note的我的个人笔记。可能会中英混杂，不是很适合外人阅读，也请见谅。

Lecture 9: Locking

lock的简单抽象：

  lock l
  acquire(l)
    x = x + 1 -- "critical section"
  release(l)

lock就是一个object，如果多个线程调用acquire，只有一个thread能拿到，其他的就需要等待release。

程序经常有很多数据，也就对应很多锁，不同的数据对应不同的锁。不过注意，锁并不是和数据绑定的，而是在和数据相关的critical region使用的。

那我们什么时候需要锁呢？

当2个或更多的线程触及到内存时。
当至少一个线程写入时。

执行上述的规则时，太过保守或是太过自由都不好。有的时候故意有一些race没关系。

那我们能不能自动加锁呢？比如说每个数据都自动的和一个锁相连。

这样太死板，会出现问题，比如说rename("d1/x", "d2/y")，实行的步骤就会是

lock d1, erase x, unlock d1
lock d2 add y unlock d2

这样会导致有一段时间文件消失了，那么信息也就没了...我们需要的是：

lock d1; lock d2;
erase x, add y
unlock d2; unlock d1

也就是说程序员需要能够控制中间过程。

我们可以把锁想想成如下几点：

avoid lost update
create atomic multi-step operations -- hide intermediate states
maintain invariants on a data structure

deadlock

对于上面的那个rename，对于有2个锁的方案，如果同时运行rename(d1/x, d2/y)和rename(d2/a, d1/b)就会发生死锁。

解决方案就是让程序员给所有的锁制定一个顺序，并让代码遵循这个顺序。显然这是很复杂的。。。

lock vs modularity tradeoff

同时这个解决方案会出现一个tradeoff，因为为了避免死锁，我们需要知道函数里面是怎么上锁的。或者说locks are often not the provate business of individual modules。所以一些时候，我们就粗暴的在函数两端上锁来使其变为单线程运行的。

Locks and parallleism

锁实际上是在避免并行操作。而符合分割数据和锁，或者说设计"fine grained locks"是很难的。所以一般从一个单独的大锁开始，如果有需要再分为多个锁。

lock in xv6

比如ide.c，里面只有idelock这一个锁。

implement locks

下面的这种实现方法为什么不对呢？

  why not:
    struct lock { int locked; }
    acquire(l) {
      while(1){
        if(l->locked == 0){ // A
          l->locked = 1;    // B
          return;
        }
      }
    }

因为有两个程序先后运行到A，而从而都获取了锁。所以我们应该把A,B变得atomic

在x86中，有一个指令xchg就是用来做这个的，从而使得锁的实现变为：

  acquire(l){
    while(1){
      if(xchg(&l->locked, 1) == 0){
        break
      }
    }
  }

xv6的实际代码是（spinlock.c）：

void
acquire(struct spinlock *lk)
{
  pushcli(); // disable interrupts to avoid deadlock.
  if(holding(lk))
    panic("acquire");

  // The xchg is atomic.
  while(xchg(&lk->locked, 1) != 0)
    ;

  // Tell the C compiler and the processor to not move loads or stores
  // past this point, to ensure that the critical section's memory
  // references happen after the lock is acquired.
  __sync_synchronize();

  // Record info about lock acquisition for debugging.
  lk->cpu = mycpu();
  getcallerpcs(&lk, lk->pcs);
}

注意出了使用了xchg，还使用了pushcli，也就是关闭了中断机制，因为中断可能会导致死锁。

release的代码就简单的多：

void
release(struct spinlock *lk)
{
  if(!holding(lk))
    panic("release");

  lk->pcs[0] = 0;
  lk->cpu = 0;

  // Tell the C compiler and the processor to not move loads or stores
  // past this point, to ensure that all the stores in the critical
  // section are visible to other cores before the lock is released.
  // Both the C compiler and the hardware may re-order loads and
  // stores; __sync_synchronize() tells them both not to.
  __sync_synchronize();

  // Release the lock, equivalent to lk->locked = 0.
  // This code can't use a C assignment, since it might
  // not be atomic. A real OS would use C atomics here.
  asm volatile("movl $0, %0" : "+m" (lk->locked) : );

  popcli();
}

这里之所以要调用__sync_synchronize()，就是因为编译器可能会在优化的时候更改指令顺序。如，虽然我们写的是：

  Core A:          Core B:
    locked = 1
    x = x + 1      while(locked == 1)
    locked = 0       ...
                   locked = 1
                   x = x + 1
                   locked = 0

但是有可能会被优化为：

      locked = 1
      locked = 0
      x = x + 1

这样就让锁失效了。__sync_synchronize()就是有这样的作用，让编译器不会move a memory reference past it. 事实上xchg也会有同样的效果（intel保证的）。

因为上述的存在，在用锁的情况下，不需要考虑memory ordering rules，只有在写exotic "lock-free"的代码的时候才需要。

我们会发现spinlock会阻塞cpu，那么其只适合短时间的锁。我们之后会讨论和spinlock不太一样的一种锁，在等待的时候会让出CPU。

关于用锁的几点建议：

don't share if you don't have to
start with a few coarse-grained locks
instrument your code -- which locks are preventing parallelism?
use fine-grained locks only as needed for parallel performance
use an automated race detector