User Tools

Site Tools


doc:howto:generic.failsafe

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doc:howto:generic.failsafe [2012/09/25 15:53]
hauke add warning about switch leaking
doc:howto:generic.failsafe [2016/10/03 16:15] (current)
stilez [Stage 3: Log into the router when has booted into failsafe mode]
Line 1: Line 1:
 +====== OpenWrt Failsafe ======
 +OpenWrt SquashFS-Images have a built-in failsafe mode. Booting into failsafe mode bypasses all configuration located on the [[doc:​techref:​filesystems#​JFFS2]] partition (the writable '​overlay'​ filesystem),​ and instead uses a basic set of hard coded defaults located on the [[doc:​techref:​filesystems#​SquashFS|SquashFS]] partition (that is the read-only partition containing the router OS). 
  
 +Failsafe mode **can** be used to fix a router which cannot be accessed in the usual ways because of a problem with configuration such as locked out users, locked out network connections,​ broken startup scripts, broken packages or configurations,​ full JFFS2 storage (or other JFFS2 content). It normally **cannot** fix more fundamental problems such as '​[[doc:​howto:​generic.debrick|hard bricking]]'​ or issues with the hardware, kernel or squashFS images that prevent the router booting properly or making connections at the hardware level.
 +
 +Failsafe mode can be triggered using three special procedures while the router boots - waiting for a flashing LED and pressing a button, waiting (with a packet sniffer) for a special broadcast packet and pressing a button, or watching for a boot message (on the serial port) and pressing a key ("​f"​) on the serial keyboard. Usually watching for a flashing LED is easiest. Whichever trigger you use, the router enters failsafe mode and you can access the command line with telnet (always possible) or a serial keyboard. The procedures are described here, as well as useful tips once you get into failsafe mode.  ​
 +
 +Once failsafe mode is triggered, the router will boot with a network address of **''​192.168.1.1/​24''​** on the **''​eth0''​** network interface, and with only essential services running. <color red>Note that the router will not respond to network traffic from outside the 92.168.1.1/​24,​ so you may need to set a suitable IP on the device used and/or ensure that routing allows this.</​color>​ If your device has multiple network interfaces (eth0, eth1, ...), usually eth0 is the interface connected to the [[doc:​hardware:​switch]] (there may be very seldom exceptions). Using telnet or a serial connection you can mount the JFFS2 partition with the command ''​**mount_root**''​ and diagnose or fix the problems on the JFFS2 partition.
 +
 +For more information,​ [[doc:​techref:​flash.layout#​partitioning.of.squashfs-images|OpenWrt Flash Layout]] explains why OpenWrt failsafe is possible, and [[doc/​techref/​process.boot|Boot Process]] explains how it works (basically OpenWrt contains an additional boot up stage, called preinit).
 +
 +**→ [[doc:​howto:​generic.debrick]]**
 +
 +===== Prerequisites =====
 +  * <color red>Your device must have **at least one configurable hardware button to use flashing LED or broadcast packet triggering**</​color>​. Any configurable button will work (except the main power on/off!), even if it is labelled as a "​reset"​ button or a "wifi on/​off"​ or something else. If your router has any button that you can physically press and release, it's likely to be configurable. Check if there'​s specific info about failsafe mode for your [[toh:​|device]] and make sure everything still works as expected everytime you update!
 +  * Your device must have an OpenWRT firmware with a [[doc:​techref:​filesystems#​SquashFS|SquashFS-Image]] partition flashed to it. Failsafe cannot be implemented on a system based on JFFS2-Images
 +  * The hardware ports you will use for commands when you are in failsafe mode must work (either a valid ethernet port or valid serial connection)
 +  * Of course you must have a networked or serial-connected device (PC, notepad etc) and for serial connection, you must be able to send key-strokes to the router, to use in failsafe mode!
 +  * The boot process must be able to succeed. Failsafe mode can be used to fix any problem in the [[doc:​techref:​filesystems#​JFFS2]] partition (because it doesn'​t need this to enter failsafe mode), but it needs the kernel partition and the [[doc:​techref:​filesystems#​SquashFS|SquashFS]] partition to be able to support the boot process, so that...
 +    * ...the boot process is able to get as far as required to register the pressing of the button
 +    * ...the minimal required binaries and the firmware'​s default configuration files are available and the boot process can successfully enter failsafe mode (these are all on the SquashFS)
 +
 +
 +===== How to trigger failsafe mode =====
 +
 +You can trigger failsafe mode in three ways:
 +  - Pressing any button at the appropriate time during the router'​s boot process. ​ You can determine when to press by:
 +    - Watching the router LEDs for flashing during boot, and pressing any hardware button when seen **(standard and often easiest)** ​
 +    - Using a packet sniffer to watch for a special broadcast packet during boot, and pressing any hardware button when seen
 +  - Using a serial connection, watching for a message during boot, then pressing the "​f"​ key on your serial keyboard
 +
 +===== Triggering by pressing any hardware button during boot =====
 +
 +==== Stage 1: Router and Computer preparation ====
 +
 +  * Power off the router
 +  * Unplug the WAN port, and in some cases if needed any other ports (if the WAN IP address and LAN IP address are the same or two LAN ports (an address collision), you would not be able to enter failsafe mode unless you unplug the other port(s) which are colliding)
 +  * Set your computer'​s IP to ''​192.168.1.2'',​ subnet mask ''​255.255.255.0''​. The router will be reached at ''​192.168.1.1''​ when failsafe mode is running. (You may also use any other IP in the range ''​192.168.1.2-254''​.)
 +  * Connect the computer to the router. You may need to check which port to use, especially if you plan to watch for a broadcast packet to trigger failsafe (see below)
 +
 +==== Stage 2: Enter failsafe mode ====
 +
 +  * To detect when failsafe mode can be triggered, there are two options: ​
 +    * Look for a bootup LED blink pattern (easiest with many routers). ​ Looking for a blink pattern is often much more convenient than the other options.
 +    * Use a packet sniffer on any computer to listen for a special broadcast packet on UDP port 4919 during boot, then press the front button on the router when this packet is seen. 
 +
 +Immediately when the LED blink pattern or the network broadcast message is seen, click the device button. If your device has multiple buttons, any button should work. OpenWrt is configured in a way, that pressing of any button during preinit will trigger booting into failsafe mode. But in case a button should not work, try another. It can also help to press the button repeatedly until the blink speeds up or the "​success"​ broadcast packet or other evidence of triggering failsafe mode successfully,​ is seen.
 +
 +=== Stage 2 option 1: Entering failsafe mode using a blinking LED on the router ===
 +
 +On many routers, OpenWrt will start to blink a "​SYS"​ LED (may be "​Power",​ may be other) on the front of the router when it is in its early boot cycle. Since [[https://​dev.openwrt.org/​changeset/​44056|r44056]] there are three different LED blinking speeds for most of the routers (in trunk and CC15.05):
 +  * first a moderate 0.1 second blinking rhythm during those two seconds, when router waits for user to trigger the failsafe mode
 +  * then either
 +    * a slow 0.2 second blink if the failsafe was not triggered and the normal boot continues
 +    * a rapid 0.05 second blink if the user pressed a button and failsafe mode was triggered
 +
 +Steps:
 +  * Power on the router.
 +  * As soon as this blink pattern is seen, press any hardware button of the router. ​ At least [[toh:​tp-link:​tl-wdr4300#​failsafe.mode|one TP-Link router]] seems to respond better to repeatedly clicking the button //before //the SYS LED starts to blink, until the SYS LED lights with the rapid-flash pattern.
 +  * The LED will change to faster blink pattern, indicating the router is now in failsafe mode.
 +
 +Some routers only have one hardware button, the reset button, which is often on the back of the unit (often labeled "​Reset"​ or "​WPS/​Reset"​). It may have a visible (external) button, or may be behind a hole (with button in the depth). If it is in a hole, you require a paper clip or similar tool to operate it. Please no not use a nail to press the button in the hole!
 +
 +=== Stage 2 option 2: Entering failsafe using the broadcast packet ===
 +
 +The exact steps will depend on the device you are using to watch for the broadcast packet. Details are given below for Linux and Windows. ​ Most *nix/BSD/OS X/​Android/​Mac should be very similar to Linux (often identical). ​ For many other devices and systems the same steps should be possible (but details not provided).
 +
 +You will need to be sure the router is connected to the device/PC, the cable is working, the device'​s firewall will not block the packet, and that network LEDs or other diagnostics you may have, show the connection is working. You may also need to temporarily disable the firewall on your device or open a port on it - take care and secure it again after!
 +
 +Steps:
 +
 +  * You will need some packet sniffing/​packet capture software: ​
 +    * ''​**Linux (also most *nix/BSD/OS X/​Android/​Mac):​**''​ Software is often built in or very easy to download. ​ GUI ''​**wireshark**''​ or console ''​**tcpdump**''​ or ''​**cshark**''​ or other. If you do not have any, then these are all very common open source ports and available + free on most platforms. Y you should be able to download one of these for your device easily in the usual way (or any other packet sniffer you like). ​
 +    * ''​**Windows:​**''​ You can use the [[http://​downloads.openwrt.org/​people/​florian/​recvudp/​recvudp-win32.zip|recvudp.exe]] utility software, or any other packet sniffer. There are also Windows versions of some of the above software as well.
 +  * Start watching for packets. ​ The exact commands or menu options are different for each sniffer, so you need to find how to do this on your software. The packet will be sent to ''​**destination address 192.168.1.255 port UDP 4919**''​. So for example, in a terminal and using tcpdump, with the router connected to port eth0, you would enter the command ''​**tcpdump -Ani eth0 port 4919 and udp**''​
 +  * Turn the router power off, wait a few seconds, and then back on. 
 +  * Look in the sniffer for the early boot network broadcast packet to be shown (Could take up to 30-40 seconds on some routers). ​ The packet will also show the message that it is waiting for your click on a button. The screenshots below show what this can look like.
 +  * When you see the packet/​message,​ press any configurable hardware key on the router. If needed press several times. Often you will know you have succeeded because the router will send a second broadcast "​success"​ packet when failsafe mode is triggered, and you will see this in the packet sniffer as well (also shown in the screenshots below). ​ But not all versions do this.
 +
 +<​html>​
 +<table class="​inline"​ style="​width:​70%;​ margin-left:​15%">​
 +  <tr>
 +    <td style="​border-left:​6px solid #f57900; vertical-align:​middle">​
 +      <img src="/​_media/​meta/​icons/​tango/​48px-emblem-question.svg.png"​ alt=""​ style="​float:​left;​ margin-right:​0.5em"​ />
 +      <​strong>​Unverified Information!</​strong><​br />
 +      Up to today (Jan 11, 2013) this page didn't precise on which port to listen. In the case of TL-WR1043ND,​ it's the WAN port. If you find a contradictory example, it will be necessarry to <a href="?​do=edit">​remove or adapt </a> this note.
 +    </td>
 +  </tr>
 +</​table>​
 +</​html>​
 +
 +== Screenshots of typical packet sniffer using broadcast packet method ==
 +'''​Broadcast packet and success packet under Linux (broadcast packet is the first part only!):'''​
 +
 +Run ''​wireshark'',​ ''​cshark''​ or ''​tcpdump''​\\
 +{{:​doc:​howto:​linux-failsafe.png|}}
 +
 +'''​Broadcast packet and success packet under Windows (broadcast packet is the first line only!):'''​
 +
 +Monitoring the special packet in a program ''​recvudp.exe''​.\\
 +{{:​media:​failsafe2.png|}}
 +
 +==== Stage 3: Log into the router when has booted into failsafe mode ====
 +
 +Important notes and troubleshooting for failsafe mode login:
 +  - <color red>In failsafe mode, the router will not respond to network traffic from outside the 92.168.1.1/​24 subnet, so you cannot telnet or ping it unless the source is also in this range of IP addresses. You may need to temporarily set a suitable IP on the device used and/or ensure that any routing and firewalls will allow packets if accessed across a network.</​color> ​
 +  - If your device has multiple network interfaces (eth0, eth1, ...), usually eth0 is the interface connected to the [[doc:​hardware:​switch]] (there may be very seldom exceptions).
 +  - If the router does not boot in safe mode despite clicking the button, it may be a timing problem, missing the brief window when OpenWrt is looking for a button press. If so, immediately after turning the router on, rapidly click and keep clicking the button on the router for about 60 seconds to try to not miss the safe mode boot window.
 +  - If your router has a ridiculously long boot time (such as [[toh/​d-link/​dir-300#​with.manual.step.by.step.guide|DIR-300 A]]), then you may do this for a longer time.   
 +
 +
 +How to tell when failsafe mode is active:
 +  * Once in failsafe mode, a network broadcast confirmation message appears (not always, for the TL-WR1043ND no message comes).
 +  * On some router models (e.g. TP-LINK models), the SYS led blinks very quickly
 +
 +If you are using a **trunk** snapshot, revision 46809 or newer, **ssh** to 192.168.1.1 from the computer and log in as root (no password required). The host key will be randomly generated. You can pass ''​-o "​UserKnownHostsFile /​dev/​null"​ -o "​StrictHostKeyChecking no"''​ to ssh if you want to allow a different host key temporarily.
 +
 +If you are using a release image, **telnet** (//not// SSH) to 192.168.1.1 from the computer. There is no username or password required.
 +
 +Now go to section [[#When you are in failsafe mode]]
 +
 +===== Serial connection triggering by keyboard key combination in a serial console =====
 +  - Unplug the router'​s power cord.
 +  - Connect the router'​s WAN port directly to your PC.
 +  - Configure your PC with a static IP address between 192.168.1.2 and 192.168.1.254. E. g. 192.168.1.2 (gateway and DNS is not required).
 +  - Plugin the power.
 +  - Connect via serial ​
 +  - Wait until the following messages is passing: Press the [f] key and hit [enter] to enter failsafe mode
 +  - Press "​f"​ and the "​enter"​ key
 +  - You should be able to **telnet** (//not// SSH) to the router at 192.168.1.1 now (no username and password)
 +
 +===== When you are in failsafe mode =====
 +
 +==== Login message ====
 +You get a message similar or same like this (using OpenWrt 12.09):
 +
 +|<​code>​
 + === IMPORTANT ============================
 +  Use '​passwd'​ to set your login password
 +  this will disable telnet and enable SSH
 + ​------------------------------------------
 +
 +
 +BusyBox v1.19.4 (2013-03-14 11:28:31 UTC) built-in shell (ash)
 +Enter '​help'​ for a list of built-in commands.
 +
 +  _______ ​                    ​________ ​       __
 + ​| ​      ​|.-----.-----.-----.| ​ |  |  |.----.| ​ |_
 + ​| ​  ​- ​  ​|| ​ _  |  -__|     ​|| ​ |  |  ||   ​_|| ​  _|
 + ​|_______|| ​  ​__|_____|__|__||________||__| ​ |____|
 +          |__| W I R E L E S S   F R E E D O M
 + ​-----------------------------------------------------
 + ​ATTITUDE ADJUSTMENT (12.09, r36088)
 + ​-----------------------------------------------------
 +  * 1/4 oz Vodka      Pour all ingredients into mixing
 +  * 1/4 oz Gin        tin with ice, strain into glass.
 +  * 1/4 oz Amaretto
 +  * 1/4 oz Triple sec
 +  * 1/4 oz Peach schnapps
 +  * 1/4 oz Sour mix
 +  * 1 splash Cranberry juice
 + ​-----------------------------------------------------
 +root@(none):/#​
 +</​code>​|
 +
 +Additional note ([[https://​dev.openwrt.org/​changeset/​42985/​trunk|r42985]]):​
 +
 +|<​code>​
 +
 +================= FAILSAFE MODE active ================
 +special commands:
 +
 +    firstboot reset settings to factory defaults
 +    mount_root mount root-partition with config files 
 +
 +after mount_root:
 +
 +    passwd change root's password
 +    /etc/config directory with config files 
 +
 +for more help see:
 +​http://​wiki.openwrt.org/​doc/​howto/​generic.failsafe
 +=======================================================
 +</​code>​|
 +
 +
 +==== The file systems in failsafe mode ====
 +OpenWrt uses an overlay file system (JFFS2) which overlays the default router files on the [[doc:​techref:​flash.layout|SquashFS]] partition. JFFS2 contains all config, all packages, and any temp or other files which are not part of the default OpenWRT. Deleting a file from the JFFS2 effectively "​resets"​ the JFFS2 file version back to default, because the original file will be seen on the SquashFS (if it existed). Deleting the entire contents of the JFFS2 will effective resets the router to OpenWRT default settings and packages.
 +
 +The root file system in failsafe mode is the only the SquashFS partition and the JFFS2 is not present. To mount (access) the JFFS2 as read/write in failsafe mode you must manually mount it. Enter the command ''​**[[https://​dev.openwrt.org/​browser/​trunk/​package/​base-files/​files/​sbin/​mount_root|mount_root]]**''​ to do this.
 +
 +Once the JFFS2 file system is mounted read/write, you can view/​edit/​delete/​fix the files which are changed from the default firmware. ​ Any files that are changed will be accessible at **''/​overlay/​*''​** (or **''/​overlay/​upper/​*''​** on some routers). ​
 +
 +The core config files are usually at **''/​overlay/​etc/​config/​*''​** (or **''/​overlay/​upper/​etc/​config/​*''​**) and have names such as "​network",​ "​firewall"​ etc. Other copies may exist in the /rom subdirectory and the router'​s UI code may exist in subdirectories such as /lua
 +
 +==== Useful commands and procedures ====
 +
 +**General:​**
 +
 +  * ''​**[[doc:​uci|The UCI command]]**''​ reads and writes the router'​s main configuration files, and is also the main command line tool for modifying the configuration. So it has a lot of helpful commands for troubleshooting and fixing config-related problems. (You can also edit the config files directly using any text editor). See [[doc:​uci|The UCI System]].
 +  * ''​**If you are not very familiar with Linux**'',​ many commands have a ''​**<​nowiki>​--</​nowiki>​help**''​ option (for example: ''​grep <​nowiki>​--</​nowiki>​help''​) which can suggest the options you need. Often you only need basic commands to get started, such as ''​**mv**''​ (move/​rename),​ ''​**cp**''​ (copy), ''​**rm**''​ (remove/​delete),​ ''​**find -name *XYZ***''​ (find all files from the current dir with XYZ in the name), ''​**cd**''​ and ''​**ls**''​ (change and list current directory), ''​**cat**''​ (view file), ''​**less**''​ (view file with page up/down, use "​q"​ to finish), ''​**grep**''​ (show matching lines/text only), and so on.  If a command "​hangs"​ or takes too long, ''​**ctrl-C**''​ will often kill it and return to a command line.
 +
 +** Specific commands and procedures:​**
 +
 +  * **Forgot password** - In case you forgot your password, you need to set a new one. Type: ''​**passwd**''​
 +  * **Forgot IP address** - In case you forgot the IP address for the router, get it with ''​**uci get network.lan.ipaddr**'' ​
 +  * **Bad switch/​bridge/​firewall settings/​cannot reach webUI/​cannot SSH** - In case you have settings that stop you reaching the webUI/SSH, you can use uci (see above) or manually edit the relevant config files to fix the problem. If you are not confident of the correct settings, it's safe and easy to rename (mv) the relevant file to some temporary name, which will cause OpenWRT to use its defaults for that file. Then after reboot you can compare the default and your problem settings file, fix the problem, and restore the other settings which were OK. The most likely files for this are the '​network'​ and '​firewall'​ config files.
 +  * **Storage (JFFS2) full** - If the JFFS2 file system is full, maybe because you installed too big/too many packages, or it contains some big logs/​dumps/​cache/​other files, ​ or any other reason, either find & delete some files or packages which you don't need, or you can clear the entire JFFS2 partition (see below). ​
 +  * **Finished/​reboot** - If you are done with failsafe mode use ''​**reboot**''​ to reboot.
 +
 +==== Changing or resetting some config by editing files ====
 +Run the command ''​**mount_root**''​ and then edit or delete such files as you need. To reset all of the JFFS2 (OpenWrt version of "​factory reset"​) see the next section.
 +
 +The core config files are usually at **''/​overlay/​etc/​config/​*''​** (or **''/​overlay/​upper/​etc/​config/​*''​**) and have names such as "​network",​ "​firewall"​ etc which you can search using the **''​find -name''​** command (see below). If you know your error is (say) some network switch or VLAN issue, then you can edit/delete the network config file and reboot. The router will keep all settings except the settings of the file you changed/​deleted which will go back to default.
 +
 +==== Wiping JFFS2 file system ('​Factory reset' to default config) ====
 +
 +  - <color red>This procedure is safe (it will restore the default setup and not brick your router).</​color>​
 +  - <color red>It __will__ clear the JFFS2 partition, resetting all custom settings and removing all installed packages, logs, dumps, and temp files, the OpenWrt equivalent of a factory reset. If you need any of these, take a backup to some other device in failsafe mode before doing this.</​color>​
 +
 +Run ''​**mount_root**''​ first (see above) to mount the JFFS2 partition. Once the JFFS2 partition is mounted for read/write, use any of these commands to erase the files on it, which resets the router:  ​
 +  * ''​**firstboot**''​
 +  * **''​rm -r /overlay/* ''​** (or /​overlay/​upper/​* on some routers)
 +  * ''​**mtd -r erase rootfs_data**''​ (this will reboot the device as part of the process)
 +
 +NOTE: there is a [[https://​dev.openwrt.org/​ticket/​20168|bug report]] that sometimes firstboot or mtd-r erase rootfs_data may not work and "​hangs"​. If that happens then the files can be deleted using the "​rm..."​ method. The overlay is "on top" of the SquashFS so deleting overlay files just leaves the original SquashFS files showing.
 +
 +===== Flash new firmware in failsafe mode =====
 +
 +Steps (overview):
 +  - Prepare your local desktop with netcat to listen to tcp/ip connections on port 3333 and feed any incoming connection with the firmware file you want to flash.
 +  - On the router with netcat connect to your local desktop ipaddress on port 3333 and pipe the recieved data to a file.
 +  - On the router perform a sysupgrade with the receieved file.
 +
 +Let us assume the following: ​
 +  * Your desktop is on ipaddress 192.168.1.123 ​
 +  * the router is on default 192.168.1.1
 +  * the name of the firmware file to flash has been renamed (for simplicity reasons) to **nxtfw.bin**
 +
 +** Windows Desktop **
 +
 +  * Download [[https://​joncraton.org/​blog/​46/​netcat-for-windows/​|netcat for windows]].
 +  * Copy the firmware you intend to flash to the directory where you have unzipped/​installed netcat
 +    * rename the file to something easy to type e.g. nxtfw.bin
 +  * Open an command prompt window, navigate to the directory where netcat is installed:
 +
 +<​code>​ nc -l -p 3333 < flash.bin</​code>​
 +
 +** Cygwin Desktop**
 +  * [[https://​cygwin.com/​install.html|Install Cygwin]] if you have not already.
 +  * If you do not have pv (Pipe Viewer) or nc (netcat) installed in your Cygwin Environment (by default they are not installed):
 +    * Re-Run the setup executable and expressly search and mark for installation:​
 +      * pv (found in Utils)
 +      * nc (found in Net)
 +  * Run Cygwin
 +
 +<​code> ​
 +$ cat nxtfw.bin | pv -b | nc -l 3333
 +</​code>​
 +
 +** Linux Desktop **
 +
 +<​code>​cat nxtfw.bin | pv -b | nc -l -p 3333</​code>​
 +  * pv (Pipe Viewer) shows progress of data transfer through the pipe.
 +  * nc (netcat) listen on port 3333 transferring the firmware
 +
 +** Failsafed device **
 +
 +  * On the Router via your Telnet connection
 +<​code>​nc 192.168.1.123 3333 > /​tmp/​nxtfw.bin</​code>​
 +
 +  * install firmware with current settings.
 +<​code>​
 +root@(none):/#​ sysupgrade /​tmp/​nxtfw.bin
 +Saving config files...
 +killall: watchdog: no process killed
 +Failed to connect to ubus
 +Switching to ramdisk...
 +Performing system upgrade...
 +Unlocking firmware ...
 +
 +Writing from <​stdin>​ to firmware ...
 +Appending jffs2 data from /​tmp/​sysupgrade.tgz to firmware...
 +Writing from <​stdin>​ to firmware ...
 +Upgrade completed
 +Rebooting system...
 +[217.460000] reboot: Restarting system
 +</​code>​
 +
 +  * after reboot you should be able to access your device on 192.168.1.1
 +===== Notes =====
 +  * the article [[doc:​techref:​process.boot]] may help you better understand when ''​failsafe''​ "kicks in" once activated