The OpenWrt Flash Layout

Most routers do not have hard drives. They use flash memory for similar purposes: storing programs and data, even when the system is off (non-volatile memory).

In most systems, flash memory does not appear like RAM and so data and instructions must be copied to RAM to be used. So, for example, the bootloader copies the kernel from flash to RAM and then starts that copy running.

There are two basic types of flash memory: NOR flash and NAND flash.
If the flash chip (no matter what type) is connected directly with the SoC and has to be addressed directly by Linux, we call it "raw flash". If there is an additional controller chip between flash chip and the SoC, we call it "FTL (Flash Translation Layer) flash". Embedded systems almost exclusively use "raw flash", while SSDs and also USB memory sticks, almost exclusively use FTL flash!

Older routers generally have raw NOR flash but many newer routers have raw NAND flash.

Raw NOR flash in typical routers is generally small (4 MiB - 16 MiB) and error-free. This is currently the normal OpenWRT setup. Because it is error-free, the file systems are a little simpler. But they still need to do wear-levelling. Conserving space is important.

Raw NAND flash in typical routers is generally larger (32 MiB - 256 MiB) and may contain bad blocks. OpenWRT isn't configured to handle this yet. One promising approach would be to use UBIFS layered on top of UBI.

Partitioning of SquashFS-Images

Almost all embedded systems contain "raw flash"-chips. OpenWrt treats and addresses this storage space as an MTD (Memory Technology Device) and employs filesystems developed for this purpose. The available storage is not partitioned in the traditional way, where you store the data about the partitions in the MBR and PBRs, but it is done in the Linux Kernel (and sometimes independently in the bootloader as well!). You simply define, that "partition kernel starts at offset x and ends at offset y". The advantage is, that later we can address these partitions by name, rather then specifying the precise start point of the data.

:!: The following table gives you a visual of the layout of the TL-WR1043ND and is merely one example! Another, slightly different, example can be found on the wiki-page for the DIR-300. So the layout does differ between the devices! Please see the wiki-page for a device, for information about its particular layout and simply check it yourself on your device.

TP-Link WR1043ND Flash Layout
Layer0 m25p80 spi0.0: m25p64 8192KiB
Layer1 mtd0 u-boot 128KiB mtd5 firmware 8000KiB mtd4 art 64KiB
Layer2 mtd1 kernel 1280KiB mtd2 rootfs 6720KiB
mountpoint /
filesystem mini_fo/overlayfs
Layer3 mtd3 rootfs_data 5184KiB
Size in KiB 128KiB 1280KiB 1536KiB 5184KiB 64KiB
Name u-boot kernel rootfs_data art
mountpoint none none /rom /overlay none
filesystem none none SquashFS JFFS2 none

Here is a LibreOffice Calc ODS for better understanding: https://web.archive.org/web/20131021013058/http://ubuntuone.com/2aPBH9pwkxtYzy93S0cS1z.

Partitions

Since the partitions are nested we look at this whole thing in layers:

  1. Layer0: So we have the Flashchip, 8MiB in size, which is soldered to the PCB and connected to the SoC over SPI (Serial Peripheral Interface Bus).
  2. Layer1: We "partition" the space into mtd0 for the bootloader, mtd5 for the firmware and, in this case, mtd4 for the ART (Atheros Radio Test) - it contains calibration data for the wifi (EEPROM). If it is missing or corrupt, ath9k (wireless driver) won't come up anymore. The bootloader (128k ) contains of the u-boot 64k block AND a data section which contains the MAC,WPS-PIN and type description. If no mac is configured wifi will not work correctly due to a faulty MAC.
  3. Layer2: we subdivide mtd5 (firmware) into mtd1 (kernel) and mtd2 (rootfs); In the generation process of the firmware (see obtain.firmware.generate) the Kernel binary file is first packed with LZMA, then the obtained file is packed with gzip and then this file will be written onto the raw flash (mtd1) without being part of any filesystem!
  4. Layer3: we subdivide rootfs even further into mtd3 for rootfs_data and the rest for an unnamed partition which will accommodate the SquashFS-partition.

Filesystems

Mount Points

  • / this is your entire root filesystem, it comprises /rom and /overlay. Please ignore /rom and /overlay and use exclusively / for your daily routines!
  • /rom contains all the basic files, like busybox, dropbear or iptables. It also includes default configuration files used when booting into OpenWrt Failsafe mode. It does not contain the Linux kernel. All files in this directory are located on the SqashFS partition, and thus cannot be altered or deleted. But, because we use overlay_fs filesystem, so called overlay-whiteout-symlinks can be created on the JFFS2 partition.
  • /overlay is the writable part of the file system that gets merged with /rom to create a uniform /-tree. It contains anything that was written to the router after installation, e.g. changed configuration files, additional packages installed with opkg, etc. It is formated with JFFS2.

Whenever the system is asked to look for an existing file in /, it first looks in /overlay, and if not there, then in /rom. In this way /overlay overrides /rom and creates the effect of a writable / while much of the content is safely and efficiently stored in the read-only /rom.

When the system is asked to delete a file that is in /rom, it instead creates a corresponding entry in /overlay, a whiteout. A whiteout is a symlink to (overlay-whiteout) that mostly behaves like a file that doesn't exist.

#!/bin/sh
# shows all overlay-whiteout symlinks
 
find /overlay -type l -exec sh -c \
    'for x; do [ "$(readlink -n -- "$x")" = "(overlay-whiteout)" ] && printf %s\\n "$x"; done' -- {} +

Directory Structure / Filesystem Hierarchy

NOTE1: If the Kernel was part of the SquashFS, we could not control where exactly on the flash it is written to (on which blocks its data is contained). Thus we could not tell the bootloader to simply load and execute certain blocks on the flash storage, but would have to address it with path and filename. Now this would not be bad, but in order to that the bootloader would have to understand the SquashFS filesystem, which it does not. The embedded bootloaders we utilize with OpenWrt generally have no concept of filesystems, thus they cannot address files by path and filename. They pretty much assume that the start of the trx data section is executable code.
NOTE2: The denomination "firmware" usually is used for the entire data on the flash comprising the boot loader and any other data necessary to operate the device, such as ART, NVRAM, FIS, etc, but we also use it to name only the parts that are being rewritten. Don't let this confuse you ;-)

Partitioning of JFFS2-Images

TODO

Discovery (How to find out)

cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00020000 00010000 "u-boot"
mtd1: 00140000 00010000 "kernel"
mtd2: 00690000 00010000 "rootfs"
mtd3: 00530000 00010000 "rootfs_data"
mtd4: 00010000 00010000 "art"
mtd5: 007d0000 00010000 "firmware"

The erasesize is the block size of the flash, in this case 64KiB. The size is little or big endian hex value in Bytes. In case of little endian, you switch to hex-mode and enter 02 0000 into the calculator for example and convert to decimal (by switching back to decimal mode again). Then guess how they are nested into each other. Or execute dmesg after a fresh boot and look for something like:

Creating 5 MTD partitions on "spi0.0":
0x000000000000-0x000000020000 : "u-boot"
0x000000020000-0x000000160000 : "kernel"
0x000000160000-0x0000007f0000 : "rootfs"
mtd: partition "rootfs" set to be root filesystem
mtd: partition "rootfs_data" created automatically, ofs=2C0000, len=530000
0x0000002c0000-0x0000007f0000 : "rootfs_data"
0x0000007f0000-0x000000800000 : "art"
0x000000020000-0x0000007f0000 : "firmware"
These are the start and end offsets of the partitions as hex values in Bytes. Now you don't have to guess which is nested in which. E.g. 02 0000 = 131.072 Bytes = 128KiB.

Details

generic

The flash chip can be represented as a large block of continuous space:

start of flash …………….. end of flash

There is no ROM to boot from; at power up the CPU begins executing the code at the very start of the flash. Luckily this isn't the firmware or we'd be in real danger every time we reflashed. Boot is actually handled by a section of code we tend to refer to as the bootloader (the BIOS of your PC is a bootloader).

Boot Loader Partition Firmware Partition Special Configuration Data
Atheros U-Boot firmware ART
Broadcom CFE firmware NVRAM
Atheros RedBoot firmware FIS recovery RedBoot config boardconfig

The partition or partitions containing so called Special Configuration Data differ very much from each other. Example: The ART-partition you will meet in conjunction with Atheros-Wireless and U-Boot, contains only data regarding the wireless driver, while the NVRAM-partition of broadcom devices is used for much more than only that. There are special utilities to access and modify special configuration partitions. For Broadcom devices this is the nvram utility. To find out what is written in NVRAM you can run nvram show.

Note that clearing these special configuration data partitions like ART, NVRAM and FIS does not clear much of OpenWrt's configuration, unlike other router software which keep configuration data solely in e.g. NVRAM. Instead, as a consequence of using the overlay_fs filesystem configuration with JFFS2 flash partition, the whole file system is writable and allows the flexibility of extending your OpenWrt installation in any way you want. OpenWrt's main configuration is therefore just kept in the root file system, using UCI configuration files. For convenience, many other packages are made UCI compatible. If you want to reset your complete installation you should use OpenWrt's built-in functionality such as sysupgrade to restore settings, by clearing the JFFS2 partition. Or, if you cannot boot normally, you can wipe or change the JFFS2 partition using OpenWrt's failsafe mode (look in your device's dedicated page for information how to boot into failsafe).

broadcom with CFE

If you dig into the "firmware" section you'll find a trx. A trx is just an encapsulation, which looks something like this:

trx-header
HDR0 length crc32 flags pointers data

"HDR0" is a magic value to indicate a trx header, rest is 4 byte unsigned values followed by the actual contents. In short, it's a block of data with a length and a checksum. So, our flash usage actually looks something like this:

CFE trx containing firmware NVRAM

Except that the firmware is generally pretty small and doesn't use the entire space between CFE and NVRAM:

CFE trx firmware unused NVRAM

(NOTE: The <model>.bin files are nothing more than the generic *.trx file with an additional header appended to the start to identify the model. The model information gets verified by the vendor's upgrade utilities and only the remaining data – the trx – gets written to the flash. When upgrading from within OpenWrt remember to use the *.trx file.)

So what exactly is the firmware?

The boot loader really has no concept of filesystems, it pretty much assumes that the start of the trx data section is executable code. So, at the very start of our firmware is the kernel. But just putting a kernel directly onto flash is quite boring and consumes a lot of space, so we compress the kernel with a heavy compression known as LZMA. Now the start of firmware is code for an LZMA decompress:

lzma decompress lzma compreszsed kernel

Now, the boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it. It adds one second to the bootup time, but it saves a large chunk of flash space. (And if that wasn't amusing enough, it turns out the boot loader does know gzip compression, so we gzip compressed the LZMA decompression program)

Immediately following the kernel is the filesystem. We use SquashFS for this because it's a highly compressed readonly filesystem – remember that altering the contents of the trx in any way would invalidate the crc, so we put our writable data in a JFFS2 partition, which is outside the trx. This means that our firmware looks like this:

trx gzip'd lzma decompress lzma'd kernel (SquashFS filesystem)

And the entire flash usage looks like this -

CFE trx gz'd lzma lzma'd kernel SquashFS JFFS2 filesystem NVRAM

That's about as tight as we can possibly pack things into flash.


Explanations

What is an Image File?

An image file is byte by byte copy of data contained in a file system. If you installed a Debian or a Windows in the usual way onto one or two hard disk partitions and would afterwards copy the whole content byte by byte from the hard disk into one file:

dd if=/dev/sda of=/media/sdb3/backup.dd

the obtained backup file /media/sdb3/backup.dd, could be used in the exact same manner like an OpenWrt-Image-File.

The difference is, that OpenWrt-Image-File are not created that way ;-) They are being generated with the Image Generator (former called Image Builder). You can read about the:

Back to top

doc/techref/flash.layout.txt · Last modified: 2014/08/09 01:09 by hawkse