|  | 
 
 发表于 2003-2-8 17:38:40
|
显示全部楼层 
| 看看这个[code:1]System.map 
 There seems to be a dearth of information about the System.map file. It's really nothing mysterious, and in the scheme of things, it's really not that important. But a lack of documentation makes it shady. It's like an earlobe; we all have one, but nobody really knows why. This is a little web page I cooked up that explains the why.
 
 Note, I'm not out to be 100% correct. For instance, it's possible for a system to not have /proc filesystem support, but most systems do. I'm going to assume you "go with the flow" and have a fairly typical system.
 
 Some of the stuff on oopses comes from Alessandro Rubini's "Linux Device Drivers" which is where I learned most of what I know about kernel programming.
 What Are Symbols?
 
 In the context of programming, a symbol is the building block of a program: it is a variable name or a function name. It should be of no surprise that the kernel has symbols, just like the programs you write. The difference is, of course, that the kernel is a very complicated piece of coding and has many, many global symbols.
 
 
 What Is The Kernel Symbol Table?
 
 The kernel doesn't use symbol names. It's much happier knowing a variable or function name by the variable or function's address. Rather than using size_t BytesRead, the kernel prefers to refer to this variable as (for example) c0343f20.
 
 Humans, on the other hand, do not appreciate names like c0343f20. We prefer to use something like size_t BytesRead. Normally, this doesn't present much of a problem. The kernel is mainly written in C, so the compiler/linker allows us to use symbol names when we code and allows the kernel to use addresses when it runs. Everyone is happy.
 
 There are situations, however, where we need to know the address of a symbol (or the symbol for an address). This is done by a symbol table, and is very similar to how gdb can give you the function name from a address (or an address from a function name). A symbol table is a listing of all symbols along with their address. Here is an example of a symbol table:
 
 c03441a0 B dmi_broken
 c03441a4 B is_sony_vaio_laptop
 c03441c0 b dmi_ident
 c0344200 b pci_bios_present
 c0344204 b pirq_table
 c0344208 b pirq_router
 c034420c b pirq_router_dev
 c0344220 b ascii_buffer
 c0344224 b ascii_buf_bytes
 
 
 You can see that the variable named dmi_broken is at the kernel address c03441a0.
 
 
 What Is The System.map File?
 
 There are 2 files that are used as a symbol table:
 
 1. /proc/ksyms
 2. System.map
 
 There. You now know what the System.map file is.
 
 Every time you compile a new kernel, the addresses of various symbol names are bound to change.
 
 /proc/ksyms is a "proc file" and is created on the fly when a kernel boots up. Actually, it's not really a file; it's simply a representation of kernel data which is given the illusion of being a disk file. If you don't believe me, try finding the filesize of /proc/ksyms. Therefore, it will always be correct for the kernel that is currently running..
 
 However, System.map is an actual file on your filesystem. When you compile a new kernel, your old System.map has wrong symbol information. A new System.map is generated with each kernel compile and you need to replace the old copy with your new copy.
 
 
 What Is An Oops?
 
 What is the most common bug in your homebrewed programs? The segfault. Good ol' signal 11.
 
 What is the most common bug in the Linux kernel? The segfault. Except here, the notion of a segfault is much more complicated and can be, as you can imagine, much more serious. When the kernel dereferences an invalid pointer, it's not called a segfault -- it's called an "oops". An oops indicates a kernel bug and should always be reported and fixed.
 
 Note that an oops is not the same thing as a segfault. Your program cannot recover from a segfault. The kernel doesn't necessarily have to be in an unstable state when an oops occurs. The Linux kernel is very robust; the oops may just kill the current process and leave the rest of the kernel in a good, solid state.
 
 An oops is not a kernel panic. In a panic, the kernel cannot continue; the system grinds to a halt and must be restarted. An oops may cause a panic if a vital part of the system is destroyed. An oops in a device driver, for example, will almost never cause a panic.
 
 When an oops occurs, the system will print out information that is relevent to debugging the problem, like the contents of all the CPU registers, and the location of page descriptor tables. In particular, the contents of the EIP (instruction pointer) is printed. Like this:
 
 EIP: 0010:[<00000000>]
 Call Trace: [<c010b860>]
 
 
 
 What Does An Oops Have To Do With System.map?
 
 You can agree that the information given in EIP and Call Trace is not very informative. But more importantly, it's really not informative to a kernel developer either. Since a symbol doesn't have a fixed address, c010b860 can point anywhere.
 
 To help us use this cryptic oops output, Linux uses a daemon called klogd, the kernel logging daemon. klogd intercepts kernel oopses and logs them with syslogd, changing some of the useless information like c010b860 with information that humans can use. In other words, klogd is a kernel message logger which can perform name-address resolution. Once klogd tranforms the kernel message, it uses whatever logger is in place to log system wide messages, usually syslogd.
 
 To perform name-address resolution, klogd uses System.map. Now you know what an oops has to do with System.map.
 
 Fine print: There are actually two types of address resolution are performed by klogd.
 
 * Static translation, which uses the System.map file.
 * Dynamic translation which is used with loadable modules, doesn't use System.map and is therefore not relevant to this discussion, but I'll describe it briefly anyhow.
 
 Klogd Dynamic Translation
 
 Suppose you load a kernel module which generates an oops. An oops message is generated, and klogd intercepts it. It is found that the oops occured at d00cf810. Since this address belongs to a dynamically loaded module, it has no entry in the System.map file. klogd will search for it, find nothing, and conclude that a loadable module must have generated the oops. klogd then queries the kernel for symbols that were exported by loadable modules. Even if the module author didn't export his symbols, at the very least, klogd will know what module generated the oops, which is better than knowing nothing about the oops at all.
 
 There's other software that uses System.map, and I'll get into that shortly.
 
 
 Where Should System.map Be Located?
 
 System.map should be located wherever the software that uses it looks for it. That being said, let me talk about where klogd looks for it. Upon bootup, if klogd isn't given the location of System.map as an argument, it will look for System.map in 3 places, in the following order:
 
 1. /boot/System.map
 2. /System.map
 3. /usr/src/linux/System.map
 
 System.map also has versioning information, and klogd intelligently searches for the correct map file. For instance, suppose you're running kernel 2.4.18 and the associated map file is /boot/System.map. You now compile a new kernel 2.5.1 in the tree /usr/src/linux. During the compiling process, the file /usr/src/linux/System.map is created. When you boot your new kernel, klogd will first look at /boot/System.map, determine it's not the correct map file for the booting kernel, then look at /usr/src/linux/System.map, determine that it is the correct map file for the booting kernel and start reading the symbols.
 
 A few nota bene's:
 
 * Somewhere during the 2.5.x series, the Linux kernel started to untar into linux-version, rather than just linux (show of hands -- how many people have been waiting for this to happen?). I don't know if klogd has been modified to search in /usr/src/linux-version/System.map yet. TODO: Look at the klogd srouce. If someone beats me to it, please email me and let me know if klogd has been modified to look in the new directory name for the linux source code.
 * The man page doesn't tell the whole the story. Look at this:
 
 # strace -f /sbin/klogd | grep 'System.map'
 31208 open("/boot/System.map-2.4.18", O_RDONLY|O_LARGEFILE) = 2
 
 
 Apparently, not only does klogd look for the correct version of the map in the 3 klogd search directories, but klogd also knows to look for the name "System.map" followed by "-kernelversion", like System.map-2.4.18. This is undocumented feature of klogd.
 
 A few drivers will need System.map to resolve symbols (since they're linked against the kernel headers instead of, say, glibc). They will not work correctly without the System.map created for the particular kernel you're currently running. This is NOT the same thing as a module not loading because of a kernel version mismatch. That has to do with the kernel version, not the kernel symbol table which changes between kernels of the same version!
 
 What else uses the System.map
 
 Don't think that System.map is only useful for kernel oopses. Although the kernel itself doesn't really use System.map, other programs such as klogd, lsof,
 
 satan# strace lsof 2>&1 1> /dev/null | grep System
 readlink("/proc/22711/fd/4", "/boot/System.map-2.4.18", 4095) = 23
 
 
 ps,
 
 satan# strace ps 2>&1 1> /dev/null | grep System
 open("/boot/System.map-2.4.18", O_RDONLY|O_NONBLOCK|O_NOCTTY) = 6
 
 
 and many other pieces of software like dosemu require a correct System.map.
 
 What Happens If I Don't Have A Healthy System.map?
 
 Suppose you have multiple kernels on the same machine. You need a separate System.map files for each kernel! If boot a kernel that doesn't have a System.map file, you'll periodically see a message like:
 
 System.map does not match actual kernel
 
 Not a fatal error, but can be annoying to see everytime you do a ps ax. Some software, like dosemu, may not work correctly (although I don't know of anything off the top of my head). Lastly, your klogd or ksymoops output will not be reliable in case of a kernel oops.
 
 How Do I Remedy The Above Situation?
 
 The solution is to keep all your System.map files in /boot and rename them with the kernel version. Suppose you have multiple kernels like:
 
 * /boot/vmlinuz-2.2.14
 * /boot/vmlinuz-2.2.13
 
 Then just rename your map files according to the kernel version and put them in /boot, like:
 
 /boot/System.map-2.2.14
 /boot/System.map-2.2.13
 
 Now what if you have two copies of the same kernel? Like:
 
 * /boot/vmlinuz-2.2.14
 * /boot/vmlinuz-2.2.14.nosound
 
 The best answer would be if all software looked for the following files:
 
 /boot/System.map-2.2.14
 /boot/System.map-2.2.14.nosound
 
 But to be honest, I don't know if this is the best situation. Everything I've seen searches for "System.map-kernelversion" but what about "System.map-kernelversion.othertext"? I have no idea. What I would do is make use of the fact that /usr/src/linux is in the standard map file search path, so your map files would be:
 
 * /boot/System.map-2.2.14
 * /usr/src/linux/System.map (for the nosound version)
 
 You can also use symlinks:
 
 System.map-2.2.14
 System.map-2.2.14.sound
 System.map -> System.map-2.2.14.sound
 
 
 [/code:1]
 
 initrd看这个[code:1]initrd provides the capability to load a RAM disk by the boot loader.
 This RAM disk can then be mounted as the root file system and programs
 can be run from it. Afterwards, a new root file system can be mounted
 from a different device. The previous root (from initrd) is then moved
 to a directory and can be subsequently unmounted.
 
 initrd is mainly designed to allow system startup to occur in two phases,
 where the kernel comes up with a minimum set of compiled-in drivers, and
 where additional modules are loaded from initrd.
 
 This document gives a brief overview of the use of initrd. A more detailed
 discussion of the boot process can be found in [1].
 
 
 Operation
 ---------
 
 When using initrd, the system typically boots as follows:
 
 1) the boot loader loads the kernel and the initial RAM disk
 2) the kernel converts initrd into a "normal" RAM disk and
 frees the memory used by initrd
 3) initrd is mounted read-write as root
 4) /linuxrc is executed (this can be any valid executable, including
 shell scripts; it is run with uid 0 and can do basically everything
 init can do)
 5) linuxrc mounts the "real" root file system
 6) linuxrc places the root file system at the root directory using the
 pivot_root system call
 7) the usual boot sequence (e.g. invocation of /sbin/init) is performed
 on the root file system
 8) the initrd file system is removed
 
 Note that changing the root directory does not involve unmounting it.
 It is therefore possible to leave processes running on initrd during that
 procedure. Also note that file systems mounted under initrd continue to
 be accessible.
 
 
 Boot command-line options
 -------------------------
 
 initrd adds the following new options:
 
 initrd=<path>    (e.g. LOADLIN)
 
 Loads the specified file as the initial RAM disk. When using LILO, you
 have to specify the RAM disk image file in /etc/lilo.conf, using the
 INITRD configuration variable.
 
 noinitrd
 
 initrd data is preserved but it is not converted to a RAM disk and
 the "normal" root file system is mounted. initrd data can be read
 from /dev/initrd. Note that the data in initrd can have any structure
 in this case and doesn't necessarily have to be a file system image.
 This option is used mainly for debugging.
 
 Note: /dev/initrd is read-only and it can only be used once. As soon
 as the last process has closed it, all data is freed and /dev/initrd
 can't be opened anymore.
 
 root=/dev/ram0   (without devfs)
 root=/dev/rd/0   (with devfs)
 
 initrd is mounted as root, and the normal boot procedure is followed,
 with the RAM disk still mounted as root.
 
 
 Installation
 ------------
 
 First, a directory for the initrd file system has to be created on the
 "normal" root file system, e.g.
 
 # mkdir /initrd
 
 The name is not relevant. More details can be found on the pivot_root(2)
 man page.
 
 If the root file system is created during the boot procedure (i.e. if
 you're building an install floppy), the root file system creation
 procedure should create the /initrd directory.
 
 If initrd will not be mounted in some cases, its content is still
 accessible if the following device has been created (note that this
 does not work if using devfs):
 
 # mknod /dev/initrd b 1 250
 # chmod 400 /dev/initrd
 
 Second, the kernel has to be compiled with RAM disk support and with
 support for the initial RAM disk enabled. Also, at least all components
 needed to execute programs from initrd (e.g. executable format and file
 system) must be compiled into the kernel.
 
 Third, you have to create the RAM disk image. This is done by creating a
 file system on a block device, copying files to it as needed, and then
 copying the content of the block device to the initrd file. With recent
 kernels, at least three types of devices are suitable for that:
 
 - a floppy disk (works everywhere but it's painfully slow)
 - a RAM disk (fast, but allocates physical memory)
 - a loopback device (the most elegant solution)
 
 We'll describe the loopback device method:
 
 1) make sure loopback block devices are configured into the kernel
 2) create an empty file system of the appropriate size, e.g.
 # dd if=/dev/zero of=initrd bs=300k count=1
 # mke2fs -F -m0 -b 1024 initrd
 (if space is critical, you may want to use the Minix FS instead of Ext2)
 (Note that due to a problem elsewhere in the kernel, you _must_ use a
 1024-byte blocksize when creating your file system.  If any other
 value is used, the kernel will be unable to mount the initrd at boot
 time, causing a kernel panic.)
 3) mount the file system, e.g.
 # mount -t ext2 -o loop initrd /mnt
 4) create the console device (not necessary if using devfs, but it can't
 hurt to do it anyway):
 # mkdir /mnt/dev
 # mknod /mnt/dev/console c 5 1
 5) copy all the files that are needed to properly use the initrd
 environment. Don't forget the most important file, /linuxrc
 Note that /linuxrc's permissions must include "x" (execute).
 6) correct operation the initrd environment can frequently be tested
 even without rebooting with the command
 # chroot /mnt /linuxrc
 This is of course limited to initrds that do not interfere with the
 general system state (e.g. by reconfiguring network interfaces,
 overwriting mounted devices, trying to start already running demons,
 etc. Note however that it is usually possible to use pivot_root in
 such a chroot'ed initrd environment.)
 7) unmount the file system
 # umount /mnt
 8) the initrd is now in the file "initrd". Optionally, it can now be
 compressed
 # gzip -9 initrd
 
 For experimenting with initrd, you may want to take a rescue floppy and
 only add a symbolic link from /linuxrc to /bin/sh. Alternatively, you
 can try the experimental newlib environment [2] to create a small
 initrd.
 
 Finally, you have to boot the kernel and load initrd. Almost all Linux
 boot loaders support initrd. Since the boot process is still compatible
 with an older mechanism, the following boot command line parameters
 have to be given:
 
 root=/dev/ram0 init=/linuxrc rw
 
 if not using devfs, or
 
 root=/dev/rd/0 init=/linuxrc rw
 
 if using devfs. (rw is only necessary if writing to the initrd file
 system.)
 
 With LOADLIN, you simply execute
 
 LOADLIN <kernel> initrd=<disk_image>
 e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0
 init=/linuxrc rw
 
 With LILO, you add the option INITRD=<path> to either the global section
 or to the section of the respective kernel in /etc/lilo.conf, and pass
 the options using APPEND, e.g.
 
 image = /bzImage
 initrd = /boot/initrd.gz
 append = "root=/dev/ram0 init=/linuxrc rw"
 
 and run /sbin/lilo
 
 For other boot loaders, please refer to the respective documentation.
 
 Now you can boot and enjoy using initrd.
 
 
 Changing the root device
 ------------------------
 
 When finished with its duties, linuxrc typically changes the root device
 and proceeds with starting the Linux system on the "real" root device.
 
 The procedure involves the following steps:
 - mounting the new root file system
 - turning it into the root file system
 - removing all accesses to the old (initrd) root file system
 - unmounting the initrd file system and de-allocating the RAM disk
 
 Mounting the new root file system is easy: it just needs to be mounted on
 a directory under the current root. Example:
 
 # mkdir /new-root
 # mount -o ro /dev/hda1 /new-root
 
 The root change is accomplished with the pivot_root system call, which
 is also available via the pivot_root utility (see pivot_root(8) man
 page; pivot_root is distributed with util-linux version 2.10h or higher
 [3]). pivot_root moves the current root to a directory under the new
 root, and puts the new root at its place. The directory for the old root
 must exist before calling pivot_root. Example:
 
 # cd /new-root
 # mkdir initrd
 # pivot_root . initrd
 
 Now, the linuxrc process may still access the old root via its
 executable, shared libraries, standard input/output/error, and its
 current root directory. All these references are dropped by the
 following command:
 
 # exec chroot . what-follows <dev/console >dev/console 2>&1
 
 Where what-follows is a program under the new root, e.g. /sbin/init
 If the new root file system will be used with devfs and has no valid
 /dev directory, devfs must be mounted before invoking chroot in order to
 provide /dev/console.
 
 Note: implementation details of pivot_root may change with time. In order
 to ensure compatibility, the following points should be observed:
 
 - before calling pivot_root, the current directory of the invoking
 process should point to the new root directory
 - use . as the first argument, and the _relative_ path of the directory
 for the old root as the second argument
 - a chroot program must be available under the old and the new root
 - chroot to the new root afterwards
 - use relative paths for dev/console in the exec command
 
 Now, the initrd can be unmounted and the memory allocated by the RAM
 disk can be freed:
 
 # umount /initrd
 # blockdev --flushbufs /dev/ram0    # /dev/rd/0 if using devfs
 
 It is also possible to use initrd with an NFS-mounted root, see the
 pivot_root(8) man page for details.
 
 Note: if linuxrc or any program exec'ed from it terminates for some
 reason, the old change_root mechanism is invoked (see section "Obsolete
 root change mechanism").
 
 
 Usage scenarios
 ---------------
 
 The main motivation for implementing initrd was to allow for modular
 kernel configuration at system installation. The procedure would work
 as follows:
 
 1) system boots from floppy or other media with a minimal kernel
 (e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and
 loads initrd
 2) /linuxrc determines what is needed to (1) mount the "real" root FS
 (i.e. device type, device drivers, file system) and (2) the
 distribution media (e.g. CD-ROM, network, tape, ...). This can be
 done by asking the user, by auto-probing, or by using a hybrid
 approach.
 3) /linuxrc loads the necessary kernel modules
 4) /linuxrc creates and populates the root file system (this doesn't
 have to be a very usable system yet)
 5) /linuxrc invokes pivot_root to change the root file system and
 execs - via chroot - a program that continues the installation
 6) the boot loader is installed
 7) the boot loader is configured to load an initrd with the set of
 modules that was used to bring up the system (e.g. /initrd can be
 modified, then unmounted, and finally, the image is written from
 /dev/ram0 or /dev/rd/0 to a file)
 8) now the system is bootable and additional installation tasks can be
 performed
 
 The key role of initrd here is to re-use the configuration data during
 normal system operation without requiring the use of a bloated "generic"
 kernel or re-compiling or re-linking the kernel.
 
 A second scenario is for installations where Linux runs on systems with
 different hardware configurations in a single administrative domain. In
 such cases, it is desirable to generate only a small set of kernels
 (ideally only one) and to keep the system-specific part of configuration
 information as small as possible. In this case, a common initrd could be
 generated with all the necessary modules. Then, only /linuxrc or a file
 read by it would have to be different.
 
 A third scenario are more convenient recovery disks, because information
 like the location of the root FS partition doesn't have to be provided at
 boot time, but the system loaded from initrd can invoke a user-friendly
 dialog and it can also perform some sanity checks (or even some form of
 auto-detection).
 
 Last not least, CD-ROM distributors may use it for better installation
 from CD, e.g. by using a boot floppy and bootstrapping a bigger RAM disk
 via initrd from CD; or by booting via a loader like LOADLIN or directly
 from the CD-ROM, and loading the RAM disk from CD without need of
 floppies.
 
 
 Obsolete root change mechanism
 ------------------------------
 
 The following mechanism was used before the introduction of pivot_root.
 Current kernels still support it, but you should _not_ rely on its
 continued availability.
 
 It works by mounting the "real" root device (i.e. the one set with rdev
 in the kernel image or with root=... at the boot command line) as the
 root file system when linuxrc exits. The initrd file system is then
 unmounted, or, if it is still busy, moved to a directory /initrd, if
 such a directory exists on the new root file system.
 
 In order to use this mechanism, you do not have to specify the boot
 command options root, init, or rw. (If specified, they will affect
 the real root file system, not the initrd environment.)
 
 If /proc is mounted, the "real" root device can be changed from within
 linuxrc by writing the number of the new root FS device to the special
 file /proc/sys/kernel/real-root-dev, e.g.
 
 # echo 0x301 >/proc/sys/kernel/real-root-dev
 
 Note that the mechanism is incompatible with NFS and similar file
 systems.
 
 This old, deprecated mechanism is commonly called "change_root", while
 the new, supported mechanism is called "pivot_root".[/code:1]
 | 
 |