This document describes the layout of data on the disk drive for a Chromium OS device and the process by which the OS is booted. Goals for the drive partitioning scheme are as follows:
- Speed - Support fast boot, where the boot loader is part of the firmware.
- Simplicity - Support autoupdate.
- Robustness - Recover from failed updates or corrupt partitions.
- Openness - Allow developers to run operating systems other than Google Chrome OS.
Goals for the boot process are as follows:
- Support readily available development platforms so that Chromium OS software can be built and tested without waiting for final hardware/firmware.
- Support a limited selection of off-the-shelf netbooks for internal trials of Chromium OS.
- Provide a secure and verifiable boot path for official Google Chrome OS devices.
Chromium OS is essentially a specially-tailored GNU/Linux distribution. We want to make as few modifications to the upstream kernel as possible, ideally none. But as with any other GNU/Linux system, the pre-kernel boot process is unavoidably dependent on the hardware, BIOS, and bootloader.
Legacy boot for x86 Linux has three steps:
- The BIOS looks at the first block of each drive until it finds a Master Boot Record (MBR). This consists of 440 bytes of real-mode code, 6 ignored bytes, 4 instances of 16-byte primary partition records, and 2 signature bytes, 0x55 and 0xAA. That's 512 bytes. BIOS copies this block into RAM and starts executing the first byte. This all happens in x86 Real Mode.
- Those first 440 bytes of MBR code are responsible for bootstrapping the rest of the OS. It searches the four partition table entries, finds a partition flagged as bootable, copies the first 512 bytes from that partition (the so-called Volume Boot Record or VBR) into RAM, and jumps there. That code then continues the boot process in some unspecified way -- typically that VBR code (as well as the MBR code) is generated by and installed by grub, lilo, syslinux, or some similar bootloader.
- Eventually, the bootloader code identifies the kernel. It also creates a special table in memory called the “zeropage table.” The bootloader initializes fields in that table by making calls to the BIOS via interrupts. After the zeropage table is filled in, a pointer to it is placed in the ESI register and execution continues with the first part of the kernel. At this point, the CPU is still running in Real Mode. The first part of the kernel then switches to protected mode and jumps to the kernel's 32-bit entry point, passing along the pointer to the zeropage table.
Legacy BIOSes will continue to boot Chromium OS from the MBR. The Chromium OS build process places GPT-aware boot sector code from syslinux in the MBR. That code can specify one GPT partition to boot, indentified by a matching UniquePartitionGUID field in the Partition Entry Array. We use partition 12 for this purpose. The second-stage syslinux bootloader is installed on that partition, along with its corresponding config file (/syslinux/syslinux.cfg). We have a tool and scripts that can change the boot partition GUID in the MBR when we need to select an alternate boot path. Virtualized systems (vmware, qemu, etc.) typically have their own legacy BIOS implementations and will use this method to boot Chromium OS images.
The Extensible Firmware Interface is a BIOS replacement originally developed by Intel® for its Itanium® systems and later expanded to include x86 and other architectures. While not enthusiastically embraced by the Linux kernel developers, it offers some advantages over legacy BIOS and is becoming more widely used, especially for 64-bit x86 systems. EFI BIOS boots like this:
- The BIOS switches into protected mode almost immediately and then switches into 64-bit mode (IA-32e mode).
- The BIOS expects the disks to be formatted using a GUID Partition
Table (GPT) which can contain a very large number of partitions, not
just four. Each GPT partition is identified by two GUIDs: a type and
a unique ID. The BIOS looks for a partition that:
- Has a type of "EFI System Partition" (28732ac1-1ff8-d211-ba4b-00a0c93ec93b)
- Is formatted as a FAT filesystem
- Contains a file named \efi\boot\bootx64.efi
- That bootx64.efi file is the bootloader, which is executed as an application within the EFI BIOS environment (still in 64-bit mode). The bootloader queries the BIOS for system information using a set of registered function calls and creates the zeropage table for the kernel. It then locates the kernel, tells the BIOS that it can free any memory that won't be needed later, switches down to 32-bit mode, and jumps to the kernel at the kernel's 32-bit protected-mode entry point (the same place where the legacy Real Mode code used to jump to).
The Chromium OS build process creates an EFI System Partition (partition 12) and installs a 64-bit version of grub2 as the bootloader (/efi/boot/bootx64.efi), along with its config file (/efi/boot/grub.cfg). 64-bit EFI BIOSes will use this bootloader. It is possible to also install a 32-bit bootloader in the same partition, but we currently do not do that. To change the boot partition, we just need to edit the grub.cfg file. Note that different EFI BIOSes may have different requirements for the pathname of the bootloader. Most EFI BIOSes contain a "Compatibility Support Module" component which makes them act like legacy BIOSes, so they may boot either way.
Google Chrome OS devices (x86/x86_64/arm) have custom BIOSes that use yet another boot method to ensure that the user is running only the bits that are intended. Instead of a separate bootloader and kernel, there is one binary blob contained in its own GPT partition. That blob is cryptographically signed and the signature is verified before booting. Under normal conditions, the process is:
- The BIOS searches the first drive (only) for a GPT partition identified with our special ChromeOS Kernel Type GUID (fe3a2a5d-4f32-41a7-b725-accc3285a309). There should be two (image A and image B). Attribute bits within each partition table entry select which of the two is the most recent (or valid) one.
- The first 64K bytes of the kernel partition are reserved for the signature header for verified boot. Following that is the 32-bit part of the kernel, a few data structures, and our bootloader stub. BIOS verifies the signature, loads the rest of kernel stuff into memory, and invokes the bootloader stub.
- The bootloader stub is just an EFI application. It sets up any tables the kernel needs in order to continue booting, and jumps to the kernel's 32-bit entry point.
The Chromium OS build process creates signed kernel images needed by the Chrome OS BIOS and installs them in their own partitions. They are signed with test keys that are found in the source tree. Official releases will of course be signed with private Google keys.
For any booting (x86) configuration, there are at least three separate kernels (along with their command lines) on the disk image. Legacy BIOS will use syslinux, which uses its own copy of the chosen kernel that's kept in partition 12. EFI BIOSes will use /boot/vmlinuz from the target rootfs. ChromeOS BIOS uses the signed kernel embedded in its own partition. Our build and update process is carefully crafted to try to keep all three of these kernels in sync. However, if you're fiddling with the kernel and commandline, you may find that your changes are being ignored. This is usually an indication that you're modifying the wrong one. In /proc/cmdline, you should see one of the strings "cros_legacy", "cros_efi", or "cros_secure". These identify which method the kernel used to boot (and that's all they do - we don't use them for any run-time decisions AFAIK).
Bootable Chromium OS drives (removable or not) share a common drive format. In the discussion that follows, “sector” refers to a 512-byte disk sector, addressed by its Logical Block Address (LBA). Although the UEFI specs allow for disk sectors of other sizes, in practice 512 bytes is the norm. We do not use the old Cylinder-Head-Sector addresses at all.
The master boot record is the first sector on the hard drive (LBA 0). As mentioned above, legacy BIOSes will boot from this sector. To protect the GUID partitions on the drive from legacy OSes, the MBR partition table normally contains a single partition entry of type 0xEE, filling the entire drive.
The second sector (LBA 1) contains the primary GPT header, followed immediately by 16K (32 sectors) of the primary GUID Partition Entry Array. In conformance with the EFI spec, another copy of these data should be located at the end of the disk as well, with the secondary GPT header in the last accessible sector and the secondary GUID Partition Entry Array immediately preceding it.
GPT allows a large number of partitions on a drive. In an attempt to reduce the effect that later partitioning changes might have on deployed systems, we are trying to enumerate the known partitions first, while leaving room for future growth. Here’s the current layout:
|1||user state, aka "stateful partition"||User's browsing history, downloads, cache, etc. Encrypted per-user.|
|2||kernel A||Initially installed kernel.|
|3||rootfs A||Initially installed rootfs.|
|4||kernel B||Alternate kernel, for use by automatic upgrades.|
|5||rootfs B||Alternate rootfs, for use by automatic upgrades.|
|6||kernel C||Minimal-size partition for future third kernel. There are rare cases where a third partition could help us avoid recovery mode (AU in progress + random corruption on boot partition + system crash). We decided it's not worth the space in V1, but that may change.|
|7||rootfs C||Minimal-size partition for future third rootfs. Same reasons as above.|
|8||OEM customization||Web pages, links, themes, etc. from OEM.|
|9||MiniOS A||Recovery partition A|
|10||MiniOS B||Recovery partition B, for upgrades. Must reside at the end of the disk.|
|11||Hibernate||Small partition reserved for hibernation state.|
|12||EFI System Partition||Contains 64-bit grub2 bootloader for EFI BIOSes, and second-stage syslinux bootloader for legacy BIOSes.|
Note that the reserved partitions will actually be present on the image, so that the partition numbering remains constant from now on. Each minimal-size partition (including the C kernel and C rootfs) is only 512 bytes, and is shoved into some space lost to filesystem alignment (between the primary partition table and the stateful partition). 64M of empty space is set aside for use by those reserved partitions if they ever need it.
Bootable USB keys have the same layout, except that kernel B and rootfs B are minimal-size, and partition 1 is limited to 720M. The total USB image size is around 1.5G. When the USB image is installed on a fixed drive, the B image is duplicated from the A image, and partition 1 is made as large as possible so that the entire disk is in use.
The exact sizes and layouts are managed by a json file. See the Disk Layout Format page for more information.
Each GPT Partition Entry contains a PartitionTypeGUID to identify the purpose of the partition, a UniquePartitionGUID which is specific to an individual partition on an individual drive, a PartitionName (not the same as the filesystem's label, and apparently unused by the Linux kernel or userspace), and some Attributes bits that the Chrome OS BIOS will use to select the bootable image. There are several standard PartitionTypeGUIDs. We use two of them, and we’ve created three new ones to identify the Chrome OS kernel and rootfs partitions and to reserve partitions for future use.
|Linux data (standard)||ebd0a0a2-b9e5-4433-87c0-68b6b72699c7|
|EFI System Partition (standard)||c12a7328-f81f-11d2-ba4b-00a0c93ec93b|
|ChromeOS future use||2e0a753d-9e48-43b0-8337-b15192cb1b5e|
At various times, Linux has used a number of means to refer to disk partitions. For the kernel command line, it may be by means of parameters like this:
root=/dev/sda3 root=LABEL=C-ROOT root=UUID=86f0f84d-e2rd0-41e7-ad44-df4faad61e73
For userspace mount points, those may correspond to paths like this:
/dev/sda3 /dev/disk/by-label/C-ROOT /dev/disk/by-uuid/86f0f84d-e2rd0-41e7-ad44-df4faad61e73
In those examples, when the kernel refers to a partition by its UUID, that UUID doesn’t come from the GPT. Each filesystem has its own UUID (and label), and that’s what the kernel looks at. Typically using the UUID notation requires starting udev in an initramfs, which takes extra time. For legacy or standard EFI BIOSes, the /dev/fooN format is used, to keep boot times to a minimum. This must be specified in the bootloader config file. The Chrome OS BIOS and bootstub passes an additional argument on the kernel command line:
This allows the kernel to identify the GPT partition from which it was loaded. The root partition is the next higher partition.
The filesystem and kernel partitions are all 2MB aligned and sized. However, in the future we may move down to 1MB to be in sync with what other OSes are doing.
The physical layout of the partitions does not have to match their order in the partition table. In fact, there are reasons why it might be advantageous that it doesn't. For example it may be necessary to resize some partitions, which is made much easier with certain physical layouts. Refer to the Partition Resizing document for details. Here’s the current fixed-disk layout:
Only Chrome OS BIOS will implement secure boot from first power-on. Portions of the firmware are read-only, forming the basis of trust to validate the read/write portions of the firmware. Once the firmware has been validated, we will continue the boot process by reading the kernel from the disk.
It is not possible to sign the GPT using public key encryption. The contents of the GPT (in particular, the partition-dependent attributes fields for the kernel GPT entries) will change as autoupdate applies updates and devices reboot and attempt to use newly updated partitions. Since the GPT is not signed and thus cannot be trusted, all firmware or software that accesses the GPT must pass security review. Firmware needs to sanity-check all GPT values before using them. Most forms of corrupted or damaged partition tables will just cause the firmware to read a portion of the drive that doesn't contain a valid kernel signature header, in which case the firmware initiates recovery mode. But we must also protect against malicious GPT entries that might open security holes, so if the GPT is suspicious or corrupted in ways that can’t be repaired, we can’t boot this device.
There are at least two kernel partitions, to support autoupdate and accidental corruption. Each kernel partition is paired with a rootfs partition; kernel A should only boot rootfs A, kernel B should only boot rootfs B, etc. The kernel partition is separate from the rootfs partition so that:
- The firmware does not need to be able to parse a filesystem in order to read the kernel. This allows using more exotic filesystems for rootfs in the future.
- The kernel and rootfs can use different algorithms for verified boot. The kernel is verified using a single signature header; rootfs uses a more complex block-based algorithm.
The GPT Partition Entry contains a 64-bit Attributes field. Bits 48-63 are available for use by a partition of any given type. Chrome OS Kernel partitions use the following attribute flags:
|56||Successful Boot Flag||Set to 1 the first time the system has successfully booted from this partition (see the File System/Autoupdate design document for the definition of success).|
|55-52||Tries Remaining||Number of times to attempt booting this partition. Used only when the Successful Boot Flag is 0.|
|51-48||Priority||4-bit number: 15 = highest, 1 = lowest, 0 = not bootable.|
|47-0||Reserved by EFI Spec|
|State||Priority||Tries Remaining||Successful Boot Flag||Description|
|Active||A, where A>0||0||1||Kernel that has booted successfully at least once.|
|Backup||B, where A>B>0||0||1||Another kernel that has booted successfully but has lower priority than the active kernel.|
|Updated||C, where C>A>0||T>0||0||Newly updated kernel, which has not booted successfully yet. Since it has higher priority than the active kernel, it will be attempted next boot.|
|Not bootable||0||0||0||Kernel partition that is not currently bootable:||In the process of being autoupdated||Ran out of boot tries before booting successfully||Failed its signature check|
- Check that (Successful Boot Flag == 1) or (Tries Remaining > 0). If Successful Boot Flag == Tries Remaining == 0, lower the Priority to 0 and find the next kernel. This was a kernel that failed its last boot try.
- Check the kernel signature header. If it’s invalid, and (Tries Remaining > 0), set Tries Remaining = Priority = 0 and find the next kernel.
- Begin copying the kernel blob into RAM.
- Check the kernel blob signature as it’s copied. If it’s invalid, set Priority = 0 and find the next kernel.
- If Tries Remaining > 0, decrement the Tries Remaining value in the partition table.
- Invoke the bootstub, which then launches the kernel.
If no valid kernel is found, we can’t boot this device. After the OS finishes booting successfully, it will modify its partition table entry, ensuring that Successful Boot Flag == 1 and Tries Remaining == 0. We can edit the other attribute fields manually if we need to change the primary boot partition. Here’s the flow in graphical form:
The same library that sanity-checks the GPT and selects the kernel partition also checks the kernel’s cryptographic signature. The kernel partition consists of the following structure:
The first 64K bytes are the cryptographic signature header blob, which contains the keys and signatures needed to verify the rest of the kernel blob (plus a few pointers and version numbers). The kernel blob consists of the 32-bit part of the Linux kernel, a config file (just the kernel command line string at the moment), a mostly-complete zeropage table, and our bootloader stub to complete the transition from BIOS to kernel. As it’s verified, the kernel blob is copied into RAM starting at the 32-bit kernel entry location of 0x100000 on x86 (for ARM the address varies by sub-architecture). Once the verification is complete, the bootloader stub is invoked, which finishes initializing the params table and jumps to the kernel.
Developers may want to do a rapid turnaround of the kernel only. This suggested procedure may help:
- Install some image onto the hard disk.
- Reboot. It should boot the kernel from partition 4 and mount the rootfs from partition 5, but it could also use partitions 2 and 3, respectively.
- Check this from a console by running
This shows where the rootfs is mounted.
- Build the new kernel using emerge-x86-generic kernel or similar. You'll need the bzImage (aka vmlinuz) file to create the signed kernel partition image. It's usually left in /build/x86-generic/boot/
- You'll also need a config.txt file, which will specify the kernel command line. You can make your own, or just reuse the one that's left in src/build/images/<board>/latest/ by the last build_image run.
- Create and sign the kernel partition image like this (in chroot):
vbutil_kernel --pack new_kern.bin \ --keyblock /usr/share/vboot/devkeys/kernel.keyblock \ --signprivate /usr/share/vboot/devkeys/kernel_data_key.vbprivk \ --version 1 \ --config config.txt \ --bootloader /lib64/bootstub/bootstub.efi \ --vmlinuz /build/x86-generic/boot/vmlinuz
- Copy new_kern.bin into partition 4 on the target (from console):
scp USER@SOMEWHERE:new_kern.bin /tmp sudo dd if=/tmp/new_kern.bin /dev/sda4
- Reboot and you should be using your new kernel.
Sometimes all one needs is to change the kernel command line, for instance to enable or disable the verified rootfs. This can be done as follows (moving the kernel blob between the target and host is required in case the keys are not available on the target):
Move the kernel which needs modifying into a file (using the appropriate source device, <src_part> below is most likely to be sda2, sda4, or sdb2 ):
sudo dd if=/dev/<src_part> of=/tmp/kernel.old
Save the old kernel command line to a file:
vbutil_kernel --verify /tmp/kernel.old --verbose | \ tail -1 > /tmp/cmd.line.old
Modify the command line as required and save it in a file (say
/tmp/cmd.line.new). Repack the kernel blob using the new command line:
vbutil_kernel --repack /tmp/kernel.new \ --config /tmp/cmd.line.new \ --signprivate <private_key> \ --oldblob /tmp/kern.old
For the recovery kernel on a removeable device, <private_key> above is recovery_kernel_data_key.vbprivk and for the main kernel on the hard drive, the <private_key> is kernel_data_key.vbprivk. The full path to the key file is required, of course.
Then verify things look OK:
vbutil_kernel --verify /tmp/kernel.new --verbose
Finally get your kernel back to the device it came from:
sudo dd if=/tmp/kernel.new of=/dev/<src_part>