User Tools

Site Tools


doc:techref:filesystems

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doc:techref:filesystems [2013/05/02 20:50]
danitool false!!, livebox doesnt need cramfs in OpenWrt
doc:techref:filesystems [2016/01/18 23:11] (current)
bgermann [OverlayFS]
Line 1: Line 1:
 +====== Filesystems ======
  
 +This article is about file systems in the OpenWrt installation on built-in flash.
 +For general external support for installing file systems on other devices, including partitioning and mounting see [[doc/​howto/​storage|this page about general storage]].
 +
 +Please read about the -> [[flash.layout]] as well. Also, note that there are two types of flash memory: [[wp>​Flash_memory#​NOR_flash|NOR flash]] and [[wp>​Flash_memory#​NAND_flash|NAND flash]]. Also, you should read up on ''​[[doc:​techref:​mtd]]''​. ​
 +
 +===== Common File System =====
 +
 +==== OverlayFS ====
 +Used to merge two filesystems,​ one read-only and the other writable. ​ [[flash.layout]] explains how this is used in OpenWRT.
 +  * [[https://​dev.openwrt.org/​browser/​trunk/​target/​linux/​generic/​patches-2.6.38/​209-overlayfs.patch?​rev=26213]]
 +  * http://​lwn.net/​Articles/​447650/​
 +  * was mainlined in Linux kernel 3.18, see [[https://​git.kernel.org/​cgit/​linux/​kernel/​git/​stable/​linux-stable.git/​tree/​Documentation/​filesystems/​overlayfs.txt|/​Documentation/​filesystems/​overlayfs.txt]]
 +  * [[https://​git.kernel.org/​cgit/​linux/​kernel/​git/​mszeredi/​vfs.git/​tree/​Documentation/​filesystems/​overlayfs.txt?​h=overlayfs.current | Overlayfs documentation]] in the official development tree
 +  * Overlayfs'​s support for inotify mechanisms is not complete yet. Events like IN_CLOSE_WRITE cannot be notified to listening process.
 +
 +==== tmpfs ====
 +  * [[wp>​tmpfs]]
 +  * ''/​tmp''​ resides on a tmpfs-partition and ''/​var''​ is a symlink to it; ''/​dev''​ resides on a little tmpfs partition of its own
 +  * (+) no wear leveling
 +  * (-) volatile (doesn'​t survive a reboot)
 +  * [[http://​lxr.free-electrons.com/​source/​Documentation/​filesystems/​tmpfs.txt|Kernel documentation on tmpfs]]
 +
 +==== SquashFS ====
 +[[wp>​SquashFS]] is a //read only// compressed filesystem. While [[wp>​gzip]] is available, at OpenWrt it uses [[wp>​Lempel–Ziv–Markov chain algorithm|LZMA]] for the compression. Since SquashFS is a read only filesystem, it doesn'​t need to align the data, allowing it to pack the files tighter thus taking up significantly less space than JFFS2 (20-30% savings over a JFFS2 filesystem)!
 +
 +  * (+) taking up as little space as possible
 +  * (+) allowing the implementation of an idiot proof [[doc:​howto:​generic.failsafe|FailSafe]] for recovery, since it is not possible to write to it
 +  * (-) read only
 +  * (-) waste space, since each time a file contained on it is modified, actually a copy of it is being copied to the second (JFFS2) partition
 +  * [[http://​lxr.free-electrons.com/​source/​Documentation/​filesystems/​squashfs.txt|Kernel documentation on SquashFS]]
 +  * [[http://​tree.celinuxforum.org/​CelfPubWiki/​SquashFsComparisons|SquashFs Performance Comparisons]]
 +
 +There is a generic problem when running SquashFS on NAND: The issue is that SquashFS has no bad block management at all and requires all blocks on order; but for proper NAND bad block management you also need to be able to skip bad blocks and occasionally relocate blocks (see [[http://​www.infradead.org/​pipermail/​linux-mtd/​2006-April/​015386.html|squashfs and NAND flash]]). That's why raw SquashFS is a bad idea on NAND (it works if you use a FTL like UBIFS).
 +
 +
 +==== JFFS2 ====
 +[[wp>​JFFS2]] is a //​writable//​ compressed filesystem with //​[[wp>​Journaling file system|journaling]]//​ and //​[[wp>​wear leveling]]//​ using [[wp>​Lempel–Ziv–Markov chain algorithm|LZMA]] for the compression.
 +
 +  * (+) is writable, has journaling and wear leveling
 +  * (+) is cool
 +  * (-) is compressed, so a program (''​[[doc:​techref:​opkg]]''​ in particular) cannot know in advance how much space a package will occupy
 +  * (+) is compressed, so a program (which is preinstalled) takes much less space, so effectively you have more space
 +
 +==== UBIFS ====
 +  * [[wp>​UBIFS]] is a file system for [[doc/​techref/​flash.layout|raw flash]]. It is used in OpenWrt NAND targets since :FIXME: around r40364
 +  * [[http://​lxr.free-electrons.com/​source/​Documentation/​filesystems/​ubifs.txt|Kernel documentation on UBIFS]]
 +
 +==== ext2 =====
 +  * [[wp>​ext2]]
 +  * Ext2/3/4 is used on x86, x86-64 and for some arch with SD-card rootfs
 +  * [[http://​lxr.free-electrons.com/​source/​Documentation/​filesystems/​ext2.txt|Kernel documentation on ext2]]
 +  * (+) a program (''​[[doc:​techref:​opkg]]''​ in particularly) knows how much space is left!
 +  * (+) good ol' veteran FOSS file system
 +  * (-) no journaling
 +  * (-) no wear leveling
 +  * (-) no transparent compression
 +
 +
 +===== Other filesystems =====
 +
 +OpenWrt does not use other filesystems as rootfs. It supports several filesystem attached to via various mechanisms like USB, SATA or network. For a list see [[doc:​howto:​storage]].
 +
 +==== mini_fo ====
 +  * was used by older OpenWrt version and thus there are still references to this in the Wiki
 +  * replaced by [[#​OverlayFS]] now.
 +  * [[https://​lwn.net/​Articles/​135283]]
 +  * [[http://​www.denx.de/​wiki/​bin/​view/​Know/​MiniFOHome]]
 +
 +
 +===== Implementation in OpenWrt =====
 +The [[doc:​techref:​Flash.Layout]] article documents how OpenWrt uses both SquashFS and JFFS2 filesystems combined into one filesystem by overlayfs. The kernel is also stored separately from these partitions in raw flash. When the kernel is built, it is also compressed with [[wp>​Lempel–Ziv–Markov chain algorithm|LZMA]] and [[wp>​gzip]],​ as documented in [[doc:​howto:​obtain.firmware.generate]].
 +
 +==== Boot process ====
 +System bootup is as follows: ->​[[process.boot]]
 +  - kernel boots from a known raw partition (without a FS), scans mtd partition //rootfs// for a valid superblock and mounts the SquashFS partition (containing ''/​etc''​) then runs ''​[[doc:​howto:​notuci.config#​etcpreinit|/​etc/​preinit]]''​. (More info at [[doc/​techref/​filesystems#​technical.details]])
 +  - ''/​etc/​preinit''​ runs ''​[[https://​dev.openwrt.org/​browser/​trunk/​package/​base-files/​files/​sbin/​mount_root|/​sbin/​mount_root]]''​
 +  - ''​mount_root''​ mounts the JFFS2 partition (''/​overlay''​) and **combines** it with the SquashFS partition (''/​rom''​) to create a new //virtual root filesystem//​ (''/''​)
 +  - bootup continues with ''/​sbin/​init''​
 +''/​overlay''​ was previously named ''/​jffs2''​
 +
 +==== Explanations ====
 +
 +| FIXME: Please feel free to merge Explanation 1 with Explanation 2 |
 +
 +=== Explanations 1 ===
 +
 +Both SquashFS and JFFS2 are compressed filesystems using [[wp>​Lempel–Ziv–Markov chain algorithm|LZMA]] for the compression. SquashFS is a //read only// filesystem while JFFS2 is a writable filesystem with //​journaling//​ and //wear leveling//​.\\
 +Our job when writing the firmware is to put as much common functionality on SquashFS while not wasting space with unwanted features. Additional features can always be installed onto JFFS2 by the user. The use of ''​mini_fo''/''​overlayfs''​ means that the filesystem is presented as one large writable filesystem to the user with no visible boundary between SquashFS and JFFS2 -- files are simply copied to JFFS2 when they'​re written.\\
 +It's not all without side effects however.\\
 +The fact that we pack things so tightly in flash means that if the firmware ever changes, the size and location of the JFFS2 partition also changes, potentially wiping out a large chunk of JFFS2 data and corrupting the filesystem. To deal with this, we've implemented a policy that after each reflash the JFFS2 data is reformatted. The trick to doing that is a special value, ''​0xdeadc0de'';​ when this value appears in a JFFS2 partition, everything from that point to the end of the partition is wiped. So, hidden at the end of the firmware images, is the value 0xdeadcode, positioned such that it becomes the start of the JFFS2 partition.\\
 +The fact that we use a combination of compressed and partially read only filesystems also has an interesting effect on package management:​\\
 +In particular, you need to be careful what packages you update. While ''​[[doc:​techref:​OPKG]]''​ is more than happy to install an updated package on JFFS2, it's unable to remove the original package from SquashFS; the end result is that you slowly start using more and more space until the JFFS2 partition is filled. The opkg util really has no idea how much space is available on the JFFS2 partition since it's compressed, and so it will blindly keep going until the opkg system crashes -- at that point you have so little space you probably can't even use opkg to remove anything.
 +
 +=== Explanation 2 ===
 +On many embedded targets that use  [[wp>​Flash_memory#​NOR_flash|NOR flash]] for the root filesystem, OpenWrt implements a clever trick to get the most out of the limited flash memory capacity while retaining flexibility for the end-user:\\
 +Basically, during the image creation, all of the rootfs contents is packed up in a SquashFS filesystem -- a highly efficient filesystem with compression support. There'​s one important detail about it though: it is a read-only filesystem. To overcome this limitation OpenWrt uses the remaining portion of the NOR rootfs partition to store an additional read/write jffs2 filesystem which is "​overlayed"​ on top of the rootfs (that is, allowing to read unchanged files from the SquashFS but storing all the modifications made to the jffs2 part).\\
 +This design has another important advantage for the end-user: even when the read/write partition is in total mess, he can always boot to the failsafe mode (which mounts only the squashfs part) and proceed
 +from there.
 +
 +==== Technical Details ====
 +
 +The kernel boot process involves discovering of partitions within the NOR flash and it can be done by various target-dependent means:
 +  * some bootloaders store a partition table at a known location
 +  * some pass the partition layout via kernel command line
 +  * some targets require specifying the kernel command line at the compile time (thus overriding the one provided by the bootloader).
 +
 +Either way, if there is a partition named ''​rootfs''​ and ''​MTD_ROOTFS_ROOT_DEV''​ kernel config option is set to ''​yes'',​ this partition is automatically used for the root filesystem.
 +
 +After that, if ''​MTD_ROOTFS_SPLIT''​ is enabled, the kernel adjusts the ''​rootfs''​ partition size to the minimum required by the particular SquashFS image and automatically adds ''​rootfs_data''​ to the list of the available mtd partitions setting its beginning to the first appropriate address after the SquashFS end and size to the remainder of the original ''​rootfs''​ partition. The resulting list is stored in RAM only, so no partition table of any kind gets actually modified.
 +
 +For more details please refer to the actual patch at:
 +[[https://​dev.openwrt.org/​browser/​trunk/​target/​linux/​generic/​patches-2.6.37/​065-rootfs_split.patch]]
 +
 +For overlaying a special ''​mini_fo''​ filesystem is used, the ''​README''​ is available from the sources at
 +[[https://​dev.openwrt.org/​browser/​trunk/​target/​linux/​generic/​patches-2.6.37/​209-mini_fo.patch]]
 +
 +
 +
 +=== Can we switch the filesystem to be entirely JFFS2? ===
 +**//''​Note:''//​**:​ It is possible to contain the entire root filesystem on a JFFS2-Partition only, instead of a combination of both.
 +The advantage is that changes to included files no longer leaves behind an old copy on the read only filesystem. So you could end up saving space.
 +The disadvantage of this would be, that you have no failsafe any longer and also, JFFS2 takes significantly more space then SquashFS.
 +
 +
 +Yes, it's technically possible, but a bit of a mess to actually pull off. The firmware has to be loaded as a trx file, which means that you have to put the JFFS2 data inside of the trx. But, as I said above, the trx has a checksum, meaning that if you ever change that data, you invalidate the checksum. The solution is that you install with the JFFS2 data contained within the trx, and then change the trx-boundaries at runtime. The end result is a single JFFS2 partition for the root filesystem. Why someone would want to do it is beyond me; it takes more space, and while it would allow you to upgrade the contents of the filesystem you would still be unable to replace the kernel (outside of the filesystem),​ meaning that a seamless upgrade between releases is still not possible! Having SquashFS gives you a failsafe mechanism where you can always ignore the JFFS2 partition and boot directly off SquashFS, or restore files to their original SquashFS versions.
 +
 +I used to have a trick where I could convert a SquashFS install to a JFFS2 install at runtime by copying all the data onto the SquashFS partition and changing the partition boundaries. I never really had much use for the util -- not to mention it required a rather large flash to store both SquashFS and JFFS2 copies of the root during transition -- so support for it was dropped.
 +
 +
 +===== Notes =====
 +Example pictures: on formated partition ​ / how data is stored (and addressed on ext3)
 +  * how data is stored and addressed by ext2:
 +  * how data is stored and addressed by ext3:
 +  * how data is stored and addressed by SquashFS:
 +  * how data is stored and addressed by JFFS2:
 +
 +===== Archive =====
 +
 +  * see [[doc:​techref:​filesystems.old]]