This article describes the steps that are performed to boot the Linux kernel. While this kind of information is not relevant to the system's functionality, it's interesting to see how the different architectures bring the system up.
The firmware usually checks that the hardware is correctly working, and retrieves part (or all) of the kernel from a storage medium and executes it. This first part of the kernel must load the rest of itself and initialize the whole system. I won't deal with firmware issues here, but only with kernel code, whose source is distributed along with Linux.
To make things difficult, the PC firmware only loads half a kilobyte of code, and establishes its own memory layout before loading this first sector. Whichever the boot media, the first sector of the boot partition is loaded in memory to address 0x7c00, where execution begins. What happens at 0x7c00 depends on the boot-loader being used; I'm going to examine three situations here: no boot-loader, lilo, loadlin.
The file called zImage is the compressed kernel image that lives in arch/i386/boot after you issued ``make zImage'' or ``make boot'' -- the latter invocation is the one I prefer, as it works unchanged on other platforms. If you built a ``big zImage'', instead, the file is called bzImage, and lives in the same directory.
Booting an x86 kernel is a tricky task because of the limited amount of available memory. The Linux kernel tries to maximize usage of the low 640 kilobytes by moving itself around several times. But let's see in detail the steps performed by a zImage kernel; all the following pathnames are relative to arch/i386/boot.
The various data movements that are performed at system boot are depicted in figure 1.
Figure 1: Data Movements performed at boot
The image is available as PostScript boot-lj.eps here
The boot steps shown up to now rely on the assumption that the compressed kernel can fit in half a meg of space. While the assumption holds most of the times, a system stuffed of device drivers might not fit any more. This oversizing may happen for example to kernels used in installation disks: these kernels can easily get bigger than the available space, and some new machinery is needed to fix the problem. This something is called bzImage, and has been introduced in kernel version 1.3.73.
A bzImage is generated by issuing ``make bzImage'' from the toplevel Linux source directory. This kind of kernel image boots similarly to the zImage, with a few changes:
In practice, Lilo uses the BIOS services to load single sectors from the disk, and then jumps to setup.S. In other words, it arranges the memory layout like bootsect.S does, so the usual booting mechanism can complete painlessly. Lilo is also able to handle a kernel command line, and this is a good reason by itself to avoid booting the raw kernel image.
If you want to boot a bzImage with Lilo, you need version 18 or newer of the tool. Earlier versions of Lilo are not able to load segments to high memory, which is needed when loading big images in order for setup.S to find the expected memory layout.
The main disadvantage of Lilo is that is uses the BIOS to load the system. This forces to put the kernel and other relevant files in disks that can be accessed by the BIOS, and in the first 1024 cylinders of them. Actually, when you use the PC firmware you really discover how old-fashioned the architecture is.
Even if you don't run Lilo, you can enjoy the documentation files that are distributed with Lilo's source code. They document the boot process on the PC, and explain how to handle (almost) every conceivable situation.
Version 1.6 and newer of the program are able to load big images.
Loadlin is able to pass a command line to the kernel and is therefore as flexible as Lilo; most of the times you'll end up writing a linux.bat file to pass a full-featured command line to Loadlin when calling the linux command.
You can use Loadlin to turn any networked PC into a Linux box: you only
need a kernel image equipped for mounting the root partition via NFS, Loadlin
and a linux.bat with the correct IP numbers in. Sure you need
a properly configured NFS server as well, but any Linux machine can do
the job. For example, the following command line turns my gilfriend's PC
(alfred.unipv.it) into a workstation:
loadlin c:\zimage rw nfsroot=/usr/root/alfred \ nfsaddrs=193.204.35.117:193.204.35.110:193.204.35.254:255.255.255.0:alfred.unipv.it
I personally don't feel you'll ever need to touch the boot code, because things get much more interesting when the system is up and running: you can exploit all the features of your processor and all the available RAM without getting mad with processor-level issues.
After performing the usual detection of devices, the firmware displays a boot menu which lets you choose what file to boot. The firmware is able to read a disk partition (though only a FAT partition), so you actually boot a ``file'', without the need to hack boot sectors and build maps of disk blocks.
The file that gets booted will usually be linload.exe, which in turn loads Milo (the `Mini Loader', whose name is a pun about Milo's size). In order to boot Linux through the ARC firmware you need to have a small FAT partition on your hard drive to store linload.exe and milo. The Linux kernel doesn't need to access the partition unless you upgrade Milo, so FAT support can be left out of your Alpha kernel without incurring in side effects.
Actually, the user can exploit different options: the ARC boot menu can be configured to boot Linux by default, and Milo can even be burnt in flash memory in order to get rid of the FAT partition. But whatever you do, you end up with Milo running.
The Milo program is a stripped-down version of the Linux kernel: it has all the Linux device drivers and some filesystem decoder; unlike the kernel it doesn't have process control and includes Alpha initialization code. The tool is able to setup virtual memory and enable it, and can load a file from either an ext2 partition or an iso9660 device. The `file' in question is loaded to virtual address 0xfffffc0000300000 and then executed. The virtual address used is the one where the Linux kernel runs: it's unlikely you'll ever load anything but Linux, with the exception of the fmu (flash management utility) program used to burn Milo in flash ROM -- fmu is compiled to execute from the same virtual address whence the kernel runs and is distributed with Milo.
It's interesting to note that Milo also includes a small 386 emulator and some of the PC BIOS functionality. This is needed in order to execute self-initialization code found on many ISA/PCI peripheral boards (PCI boards, though claiming to be processor-independent, use intel machine code in their ROM images).
But, if Milo does all of this, what is left to the Linux kernel?
A very little, actually. The first kernel code to execute in Linux-Alpha is arch/alpha/kernel/head.S, and it just needs to setup a few pointers and jump to start_kernel(). Actually, kernel/head.S for Alpha is much shorter than the equivalent x86 source file.
If you don't want to run Milo there is an alternative, though not a practical one. In arch/alpha/boot you'll find the sources of a `raw' loader which gets compiled by issuing ``make rawboot'' from the toplevel Linux source directory. The utility is able to load a file from a sequential region of a device (the floppy or the hard disk) using the firmware's callbacks.
In practice, the raw loader accomplishes a task similar to what bootsect.S does for the PC platform, and this forces to copy the kernel to either a raw floppy or a raw hard-disk partition. As you see, there's no real reason to try out this technique, which is quite hairy and lacks the flexibility Milo offers. I personally don't even know if it still works: the ``PALcode'' used by Linux is exported by Milo, and is different from the one exported by the ARC firmware. The PALcode is a library of low-level functions used by Alpha processors to implement low-level hardware management like paging; if the current PALcode implements different operations than the software expects, the system won't work.
What the user sees it that the firmware loads a program and executes it, the program in turn is able to retrieve and uncompress a file found on a disk partition. The `program' in question is called Silo, and it can read files from either an ext2 partition or an ufs one. Unlikely Milo (likely Lilo), Silo is able to boot another operating system. There is no such need for the Alpha, because the firmware can boot multiple systems: once you run Milo, you have already made your choice -- the Right Choice.
When a Sparc computer boots, the firmware loads a boot sector after performing all the hardware checks and device initialization. It's interesting to note that Sbus devices are platform independent, and their initialization code is portable Forth code rather than machine language bound to a particular processor.
The boot sector that gets loaded is what you find in /boot/first.b in your Linux-Sparc system, and is a bare 512 bytes. It is loaded to address 0x4000 and its role is retrieving from disk /boot/second.b and putting it to address 0x280000 (2.5 megs); the address has been chosen because the Sparc specifications state that at least three megabytes of RAM are mapped at boot time.
Everything else is then performed by the second-stage boot loader: it is linked with libext2.a to access system partitions, and can thus load a kernel image from your Linux filesystem. It can also uncompress the image because it includes inflate.c, from gzip.
second.b accesses a configuration file called /etc/silo.conf, similar in shape to lilo.conf. Since the file is read at boot time there's no need to re-install the kernel maps when a new kernel is added to the boot choices. When Silo shows its prompt you can choose from any kernel image (or other operating system) specified in silo.conf, or you can specify a complete device/pathname pair to load a different kernel image without editing the configuration file.
Silo loads the disk file to address 0x4000. This means that the kernel must be shorter than 2.5 megs: if it is longer Silo will refure to overwrite its own image. No conceivable Linux-Sparc kernel is currently bigger than thant, unless you compiled it with ``-g'' to have debugging information available. In this case the kernel image must be stripped before being handled to Silo.
Finally, Silo performs kernel decompression and/or remapping to place the image at virtual address 0xf0004000. The code that takes over Silo is -- as you may imagine -- arch/sparc/kernel/head.S. The source includes all the trap tables for the processor and the actual code to set the machine up and call start_kernel(). The Sparc version of head.S is actually quite big.
The start_kernel() function calls setup_arch() first, which is the last architecture-specific function. Unlike other code, however, setup_arch() can exploit all the processor's features, and is a much easier source file than the ones described earlier. The function is defined in kernel/setup.c under each architecture source tree.
The function then initializes all the kernel's subsystems -- IPC, networking, buffer cache and so on. After all initialization is over, these two lines complete start_kernel():
kernel_thread(init, NULL, 0); cpu_idle(NULL);The init thread is process number 1: it mounts the root partition, executes /linuxrc if CONFIG_INITRD has been selected at compile time, and then executes the init program. If init can't be found, /etc/rc is executed; using rc is discouraged nowadays, as init is much more flexible than a shell script in handling system configuration.
If neither init nor /etc/rc can be run, or if they exit, /bin/sh is executed repeatedly. This feature only exists as a safeguard in case the system administrator removes or corrupts init by mistake: if you remove a.out support from the kernel forgetting that your old init has not been recompiled, you'll enjoy having at least a shell running after reboot.
The kernel has nothing more to do after spawning process number 1, and everything else is handled in user space -- by init, /etc/rc or /bin/sh.
And process 0? we've seen hoe the so called idle task executed cpu_idle(): this is a function that calls the idle() function in an endless loop. idle() in turn is an architecture-dependent function, which is usually in charge of turning off the processor to save power and increase the processor's lifetime.
Alessandro is a Linux enthusiast who writes documentation because he's not smart enough to write software. His 486 is specialized in grepping through source code, and humbly leaves real jobs to the Alpha and the Sparc.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved