User Tools

Site Tools


jp:doc:techref:flash.layout

OpenWrt フラッシュレイアウト

ほとんどのルータはハードドライブを持ちません。それらは似たような目的でフラッシュメモリを使います。:システムオフ時でもプログラムやデータを保存(不揮発性メモリ)。

ほとんどのシステムではフラッシュメモリはRAMにようには見えませんので、データや命令を使用するRAMへコピーしなければなりません。 従って、例えば、ブートローダはフラッシュからRAMへカーネルをコピーし複製動作を開始します。

2種類の基本タイプのフラッシュメモリがります。:NOR flashNAND flash.
フラッシュチップ(タイプに関係なく)が直接SoCに接続されLinuxでアドレス指定しなればならない場合、"raw flash"と呼びます。フラッシュチップとSoCの間に付加的なコントローラがある場合は"FTL (Flash Translation Layer) flash"と呼びます。組込システムではほとんどの場合、専ら"raw flash" を用い、SSDやUSBメモリスティックを使う場合はほぼFTL flashを使います。

古いルーターでは一般的にraw NORフラッシュですが、新しいルータではNANDフラッシュです。

標準的なRaw NORフラッシュルータは一般的に少なく(4 MiB - 16 MiB)、エラーもありません。これは現在の通常OpwnWrtのセットアップです。エラーフリーであることからファイルシステムはかなり単純です。 しかしながらそれにはウェアレベリング(摩耗平滑化)が必要で、領域の節約は重要です。

標準的なRaw NANDフラッシュルータは一般的に大きく(32 MiB - 256 MiB) 、不良領域が含まれます。 OpenWRTはまだこれを取り扱うように構成されていません。UBIのトップに UBIFS を階層化して使うことが有望なアプローチです。

フラッシュの区画

ほとんどの組み込みシステムが"raw flash"-チップを含みます。OpenWrtは MTD (Memory Technology Device)としてこの保存スペースを扱いアドレス指定をし、この目的のために開発されたファイルシステムを使用します。利用可能な保存空間は従来の方法で区画されいないので、MBRPBRでの区画についてデータを格納しますが、それはLinuxカーネルによって行われ(時折、ブートローダから独立し)ます。"カーネル開始オフセットxから終了オフセットy区画" としてそれは単純に定義されます。区画の便利な指定方法として、開始オフセットを何度も指定する代わりに名前を使うことが可能です。

一般的なフラッシュのレイアウトは以下のです。:

Layer0 raw flash
Layer1 bootloader
partition(s)
optional
SoC
specific
partition(s)
OpenWrt firmware partition optional
SoC
specific
partition(s)
Layer2 Linux Kernel rootfs
mounted: "/", OverlayFS with /overlay
Layer3 /dev/root
mounted: "/rom", SquashFS
size depends on selected packages
rootfs_data
mounted: "/overlay", JFFS2
"free" space

多くの新しいデバイスがこの構成を共有し、詳細はU-BootとSoC特有のファームウェアイメージに関しては少し異なります。特定のレイアウトに関して詳しくは各SoCとデバイスのWikiページを参照してください。フラッシュレイアウトが異なる場合は、Wikiページを更新してください。
ここでは実際のデバイスのいくつかの事例を示します。:

事例

Qualcomm Atheros ベースの TL-WR1043ND. ち注:"art" 区画はSoC特有のWiFiデータ。 There is also a LibreOffice Calc ODS で数値編集可能なデータもあります。

Layer0 raw flash, 8192 KiB
Layer1 mtd0
u-boot
128 KiB
mtd5
firmware
8000 KiB (= FlashSize-(128+64))
mtd4
art
64 KiB
Layer2 code
64 KiB
data
64 KiB
mtd1
kernel
about 1.2 MiB
mtd2
rootfs
Layer3 /dev/root
around 1.5 MiB
mtd3
rootfs_data
around 5 MiB

事例 2: Hoo Too HT-TM02

Ralink RT5350F ベースの Hoo Too HT-TM02.

Layer0 raw flash, 8192 KiB
Layer1 mtd0
u-boot
192 KiB
mtd1
u-boot-env
64 KiB
mtd2
factory
64 KiB
mtd3
firmware
7872 KiB (= FlashSize-(192+64+64))
Layer2 mtd4
kernel
about 1 MiB
mtd5
rootfs
Layer3 /dev/root
around 2 MiB
mtd6
rootfs_data
around 4.5 MiB

いくつかのデバイスでは、OpenWrt区画ワームウェアが存在しないかもしれません。DIR-300 flash layout は一例ですので、詳細はそちらを参照してください。

区画(パーティション)

区画が入れ子にされているので、階層で全体を見ます。:

  1. Layer0: We have the Flashchip, e.g. 8MiB in size, which is soldered to the PCB and connected to the SoC over e.g. the SPI (Serial Peripheral Interface Bus).
  2. Layer1: We partition the flash space into:
    1. one or more partition for the bootloader. A U-Boot partition usually consists of 64 KiB u-boot block and a following 64 KiB data block section which contains things like MAC, WPS-PIN, type description…. If no MAC is configured, Wifi will not work correctly.
    2. a partiton for the OpenWrt firmware
    3. one or more partition for SoC specific firmware, e.g. "art" for Qualcomm Atheros
  3. Layer2: we subdivide the "firmware" into:
    1. "kernel" at the start. In the generation process of the firmware (see obtain.firmware.generate) the Kernel binary file is first packed with LZMA, then packed with gzip. The kernel image is written onto the raw flash, it's not part of any filesystem!
    2. "rootfs" follow, it contains the file system
  4. Layer3: we subdivide "rootfs" further into:
    1. fixed ROM data using SquashFS
    2. "rootfs_data" using JFFS2. This is the free space on the device that can be used

Mount Points

  • "/" is your entire root filesystem, it comprises "/rom" and "/overlay". Please ignore "/rom" and "/overlay" and use exclusively "/" for your daily routines.
  • "/rom" contains all the basic files, like busybox, dropbear or iptables. It also includes default configuration files used when booting into OpenWrt Failsafe mode. It does not contain the Linux kernel. All files in this directory are located on the SqashFS partition, and thus cannot be altered or deleted. But, because we use OverlayFS filesystem, so called overlay-whiteout-symlinks can be created on the JFFS2 partition.
  • "/overlay" is the writable part of the file system that gets merged with /rom to create a uniform /-tree. It contains anything that was written to the router after installation, e.g. changed configuration files, additional packages installed with OPKG, etc. It is formated with JFFS2.

Whenever the system is asked to look for an existing file in "/", it first looks in "/overlay", and if not there, then in "/rom". In this way "/overlay" overrides "/rom" and creates the effect of a writable "/" while much of the content is safely and efficiently stored in the read-only "/rom".

When the system is asked to delete a file that is in /rom, it instead creates a corresponding entry in /overlay, a whiteout. A whiteout is a symlink to (overlay-whiteout) that mostly behaves like a file that doesn't exist.

#!/bin/sh
# shows all overlay-whiteout symlinks
 
find /overlay -type l -exec sh -c \
    'for x; do [ "$(readlink -n -- "$x")" = "(overlay-whiteout)" ] && printf %s\\n "$x"; done' -- {} +

Directory Structure and Filesystem Hierarchy

NOTE1: If the Kernel was part of the SquashFS, we could NOT control where exactly on the flash the kernel is written (namely, on which blocks the data is written to). Thus we can not tell the bootloader to simply load and execute certain blocks on the flash storage. Now this would not be too bad, but in order for that to happen the bootloader would have to understand the SquashFS filesystem; which it does not. The embedded bootloaders we utilize with OpenWrt generally has no concept of filesystems, and thus they cannot address files by path and filename. The bootloaders pretty much assume that the start of the trx data section is executable code.
NOTE2: Generally,the term "firmware" is used for the all data stored on the flash -comprising the boot loader and any other data necessary to operate the device (such as ART, NVRAM, FIS, etc), but it is also use to name only the parts that are being rewritten. Don't let this confuse you ;-)

Partitioning of JFFS2-Images

TODO

Kernel

The kernel knows the flash layout, because it is hard coded somewhere, e.g:

User

Look at the kernel's boot loader (or run dmesg after a fresh boot) and look for something like:

Creating 5 MTD partitions on "spi0.0":
0x000000000000-0x000000020000 : "u-boot"
0x000000020000-0x000000160000 : "kernel"
0x000000160000-0x0000007f0000 : "rootfs"
mtd: partition "rootfs" set to be root filesystem
mtd: partition "rootfs_data" created automatically, ofs=2C0000, len=530000
0x0000002c0000-0x0000007f0000 : "rootfs_data"
0x0000007f0000-0x000000800000 : "art"
0x000000020000-0x0000007f0000 : "firmware"
These are the start and end offsets of the partitions as hex values in Bytes. Now you don't have to guess which is nested in which. E.g. 02 0000 = 131.072 Bytes = 128KiB.

Running cat /proc/mtd will also give a list of partitions:
dev:    size   erasesize  name
mtd0: 00020000 00010000 "u-boot"
mtd1: 00140000 00010000 "kernel"
mtd2: 00690000 00010000 "rootfs"
mtd3: 00530000 00010000 "rootfs_data"
mtd4: 00010000 00010000 "art"
mtd5: 007d0000 00010000 "firmware"

The column erasesize is the block size of the flash, in this case it's 64KiB (= 0x10000). The column size gives the size as hex value. Unfortunately, this does not show if the partitions are nested. This can be obtained form the kernel boot log only - or from "educated guessing".
Note that on some platforms (FIXME: which :?: Examples :?:) give the column data as little endian integer, ie not in the natural aabbccdd but reversed ddccbbaa.

Generic

On power up the SOC's CPU begins executing the code from boot ROM. It contains code to boot from various sources:

  • Flash
  • USB
  • SD-Card
  • Network

Usually, the boot ROM code (1st stage boot loader) is very limited. Sophisticated boot options like network boot are usually not included there. It just contains enough code to load the next boot stage (2nd stage boot loader) from a storage device (Flash, SD-Card, USB).

The 2nd boot loader isn't part of the OpenWrt firmware. This greatly improves the stability of the device, as failing to flash a proper version of OpenWrt will not brick it. The 2nd stage bootloader can still be used to flash a new image - it's just a bit more effort and usually less convenient.

Note that this boot approach is similar to the PC. There is ROM code at the 1st stage, the BIOS at the 2nd stage and then the OS bootloader from hard disk as 3rd state. Also, the BIOS can boot from USB, Floppy, CR-ROM, Network … in case the OS on the hard disk is broken.

The bootloader is part of the Flash, e.g.:

Boot Loader Partition Firmware Partition Special Configuration Data
Atheros U-Boot OpenWrt ART
Atheros RedBoot OpenWrt FIS recovery RedBoot config boardconfig
Broadcom CFE OpenWrt NVRAM

The partition or partitions containing so called Special Configuration Data differ very much from each other. Examples: * the ART-partition exists on Qualcomm-Atheros based devices and U-Boot is the 2nd stage boot loader.
* the NVRAM-partition of Broadcom based devices is used for much more. There are special utilities (like the "nvram utility") to access and modify special configuration partitions. To find out what is written in NVRAM you can run nvram show.

Clearing these special configuration data partitions like ART, NVRAM and FIS does not clear much of OpenWrt's configuration, unlike other router software which keep configuration data solely in e.g. NVRAM. Instead, as a consequence of using the OverlayFS filesystem configuration with JFFS2 flash partition, the whole file system is writable and allows the flexibility of extending your OpenWrt installation in any way you want. OpenWrt's main configuration is therefore just kept in the root file system, using UCI configuration files. For convenience, many other packages are made UCI compatible. If you want to reset your complete installation you should use OpenWrt's built-in functionality such as sysupgrade to restore settings, by clearing the JFFS2 partition. Or, if you cannot boot normally, you can wipe or change the JFFS2 partition using OpenWrt's failsafe mode (look in your device's dedicated page for information how to boot into failsafe).

Broadcom with CFE

If you dig into the "firmware" section you'll find a trx. A trx is just an encapsulation, which looks something like this:

trx-header
HDR0 length crc32 flags pointers data

"HDR0" is a magic value to indicate a trx header, rest is 4 byte unsigned values followed by the actual contents. In short, it's a block of data with a length and a checksum. So, our flash usage actually looks something like this:

CFE trx containing firmware NVRAM

Except that the firmware is generally pretty small and doesn't use the entire space between CFE and NVRAM:

CFE trx firmware unused NVRAM

(NOTE: The <model>.bin files are nothing more than the generic *.trx file with an additional header appended to the start to identify the model. The model information gets verified by the vendor's upgrade utilities and only the remaining data – the trx – gets written to the flash. When upgrading from within OpenWrt remember to use the *.trx file.)

So what exactly is the firmware?

The boot loader really has no concept of filesystems, it pretty much assumes that the start of the trx data section is executable code. So, at the very start of our firmware is the kernel. But just putting a kernel directly onto flash is quite boring and consumes a lot of space, so we compress the kernel with a heavy compression known as LZMA. Now the start of firmware is code for an LZMA decompress:

lzma decompress lzma compressed kernel

Now, the boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it. It adds one second to the bootup time, but it saves a large chunk of flash space. (And if that wasn't amusing enough, it turns out the boot loader does know gzip compression, so we gzip compressed the LZMA decompression program)

Immediately following the kernel is the filesystem. We use SquashFS for this because it's a highly compressed readonly filesystem – remember that altering the contents of the trx in any way would invalidate the crc, so we put our writable data in a JFFS2 partition, which is outside the trx. This means that our firmware looks like this:

trx gzip'd lzma decompress lzma'd kernel (SquashFS filesystem)

And the entire flash usage looks like this -

CFE trx gz'd lzma lzma'd kernel SquashFS JFFS2 filesystem NVRAM

That's about as tight as we can possibly pack things into flash.


Explanations

What is an Image File?

An image file is byte by byte copy of data contained in a file system. If you installed a Debian or a Windows in the usual way onto one or two hard disk partitions and would afterwards copy the whole content byte by byte from the hard disk into one file:

dd if=/dev/sda of=/media/sdb3/backup.dd

the obtained backup file /media/sdb3/backup.dd, could be used in the exact same manner like an OpenWrt-Image-File.

The difference is, that OpenWrt-Image-File are not created that way ;-) They are being generated with the Image Generator (former called Image Builder). You can read about the:

jp/doc/techref/flash.layout.txt · Last modified: 2017/04/28 10:32 by seiji.komine