ULK2-Chapter5 Notes

jjww · 发表于 2003-5-18 13:14:29

经过两天的热身，开始读linux内核代码，我选择ULK2为主要蓝本。由于以前读FreeBSD代码时，曾参考了ULK1的中文版，我从ULK2的第5章开始。还没读完,把已读的和大家交流. :-)
Note1: 临界区
A critical region is any section of code that must be completely executed by any kernel control path that enters it before another kernel control path can enter it.
临界区：当内核正在执行处于临界区内的代码的时候，在该内核控制路径没有执行完临界区的代码之前，任何其它内核控制路径不能进入该临界区。
Note2: 原子操作
类型atomic_t：typedef struct { volatile int counter; } atomic_t;
原子操作的实现关键点：
1. volatile标志
The ‘volatile’ keyword indicates that the instruction has important side-effects. GCC will not delete a volatile `asm' if it is reachable. (The instruction can still be deleted if GCC can prove that control-flow will never reach the location of the instruction.) In addition, GCC will not reschedule instructions across a volatile ‘asm’ instruction. [GCC]
2. lock
The ‘lock’ prefix forces an atomic operation to insure exclusive use of shared memory in a multiprocessor environment. [Intel]
3. __builtin_constant_p
You can use the built-in function ‘__builtin_constant_p’ to determine if a value is known to be constant at compile-time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is _not_ a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option. [GCC]
[code:1]
E.g.1 atomic_add(int i, atomic_t *v)函数
static __inline__ void atomic_add(int i, atomic_t *v)
{
__asm__ __volatile__(
LOCK "addl %1,%0"
:"=m" (v->counter)
:"ir" (i), "m" (v->counter));
}
[/code:1]
从volatile和lock两方面切实地保护了atomic_add在MP环境里，在执行这条指令的时候，不会被调度(volatile)，内存v空间不会被其它CPU访问(LOCK)。
[code:1]
E.g.2 test_bit(nr, addr)
#define test_bit(nr, addr) \
(__builtin_constant_p(nr) ? \
constant_test_bit((nr),(addr)) : \
variable_test_bit((nr),(addr)))
[/code:1]
如果nr在编译的时候就确定了是常量，则调用constant_test_bit处理，否则调用variable_test_bit处理。
疑问：关于这种情况的同步，不是很清楚，系统是如何保护的？
Note3: Memory Barrier
A memory barrier primitive ensures that the operations placed before the primitive are finished before starting the operations placed after the primitive. Thus, a memory barrier is like a firewall that cannot be passed by any assembly language instruction.
我们提取几个关键控制函数/宏分析：
[code:1]
#define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
#define rmb() mb()
#ifdef CONFIG_X86_OOSTORE
#define wmb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
#else
#define wmb() __asm__ __volatile__ ("": : :"memory")
#endif [/code:1]
这几个宏对MP和UP同样有效
“memory”：If your assembler instruction modifies memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. You will also want to add the `volatile' keyword if the memory affected is not listed in the inputs or outputs of the `asm', as the `memory' clobber does not count as a side-effect of the `asm'. [GCC]
wmb比mb简单的原因是：通常，Intel处理器从来不会对写内存操作重新排序。
疑问：内存屏障到底保护的什么指令？即，我们如何使用？
Note4: Spin Locks
自旋锁(spin lock)：只有在MP系统中才有意义，参考<asm-i386/spinlock.h>。
自旋锁定义:
[code:1]typedef struct {
volatile unsigned int lock;
#if SPINLOCK_DEBUG
unsigned magic;
#endif
} spinlock_t;
E.g.: spin_lock(spinlock_t *lock)
static inline void spin_lock(spinlock_t *lock)
{
#if SPINLOCK_DEBUG
__label__ here;
here:
if (lock->magic != SPINLOCK_MAGIC) {
printk("eip: %p\n", &&here);
BUG();
}
#endif
__asm__ __volatile__(
spin_lock_string
:"=m" (lock->lock) : : "memory");
}[/code:1]
Note5: Read/Write(R/W) Spin Locks
显而易见，Read Lock是可以由多个控制路径所共享的，而Write Lock只能有一个控制路径独占使用。通过数据结构rwlock_t表示，关键域lock。
阀值：0x01000000，是该类型锁lock的初始值。当申请一个Read Lock时，lock减1；释放一个Read Lock时，lock加1。当申请一个Write Lock时，lock减去阀值；释放一个Write Lock时，lock加上阀值。
在申请不能获得的时候，会自旋等待直到获得该锁或是时间片用完。辅助函数write_trylock是申请Write Lock，在不能获得的情况下，直接返回。Linux目前(2.4.20)没有实现read_trylock的功能。
和FreeBSD的lockmgr锁相似，Linux采用Spin机制，使得代码简介。FreeBSD的实现代码相对复杂些，但是控制更为灵活写。
感觉后面讲的读写semaphore和FreeBSD的s/x锁比较相似。
[GCC] gcc info
[Intel] Intel OS volume

Dragonfly · 发表于 2003-5-19 10:30:52

i always feel headache about these arch specific stuff. anybody can discuss this? hehe.

jjww · 发表于 2003-5-19 19:31:53

我也是一知半解的，需要大家讨论。我这个周五有个考试，这几天不能怎么看source，郁闷！
不过，今天我还是抽时间看了一下linux的rw_lock_t的一个使用，在driver方面的资源申请函数里，resource_lock。linux的观点：如果申请分配资源是write lock；如果是获取链表read lock，根据wt_lock_t的实现，无论任何一种情况发生阻塞，都会自旋，假设有两个CPU，P1(在CPU1上)正在申请资源，P2(在CPU2上)也想同类型的申请，假设P1先获得，则P2自旋于CPU2。FreeBSD资源分配的时候遇到这样的情况，处于CPU2的内核调度实体(KSE)会释放CPU2的占有，让其它内核控制路径占有CPU2，而该KSE这进入sleep，等待P1处理完后发出wakeup信号，再加入runqueue中。但是，FreeBSD在这里不分别read还是write，只要有冲突，都是那样处理。
这样看来，还真不好说，谁的效率高点哈。

jjww · 发表于 2003-5-19 20:11:06

我想再说一下，谁知道memory barrier迫使CPU从内存里读数据，这个barrier的有效期是多久？就是它能保护后续多少指令这么做，或是说保护内存地址的什么区域，由当时的esp决定吗？什么时候解barrier，没有看见相关的说明。迷惑！

Dragonfly · 发表于 2003-5-19 22:38:40

from that bitops.h. u can see that all __xx_xx() donot have LOCK while all xx_xx() has LOCK. and these __xx_xx() are called in the contexts that already hold the spinlock to the data. so i guess here LOCK is useless now. the only problem is the test_bit(). i wonder if it is because that no change happen there and it is one asm statement.
i think the cost between busy wait & context switch varies. that is why we emphasis that the code in spinlock should be as short as possible. a context switch may have xk lines of code.
i remember that memory barrier is force cpu finish all code here before cross the barrier. since the out of order execution may change the author intention, especially in hardware access code.

jjww · 发表于 2003-5-20 09:14:41

你说的“ force cpu finish all code here before cross the barrier. ”，我的理解是：
是不是在这次时间片里，它都必须从内存里读，而不能从cache里读。如果是这样，在执行了mb后，正好时间片用完，schedule后，不是就没意义了吗？
还有可能就是memory关键字使所有cache的内容失效，后面的代码必须都首先从内存读，相当于cache的reset？
我不知道该如何理解。 :-(
i,c
e.g.： __test_and_clear_bit is called under the protection of spin lock.
3x, I check some sample, it does.
And test_bit(nr,addr), I check some instances. I can't find the 'nr' is variable, it always is constant.

3x a lot!

Dragonfly · 发表于 2003-5-20 11:11:39

i am not very clear about memory barrior as well. i will check some doc and discuss with u later.

there are many code where nr is variable. for example in fs/jfs/jfs_txnmgr.c. i am not sure if u know the lxr. it is useful. u may install one. http://lxr.linux.no/blurb.html

jjww · 发表于 2003-5-20 11:23:30

收到，迟些时候我再装个lxr，我现在是在windows下通过sourceinsight看的。

先把考试搞定。

Dragonfly · 发表于 2003-5-20 11:35:49

o, ic. good luck u test.

		自动登录	找回密码
密码			注册