User Tools

Site Tools


doc:recipes:high-availability

:!: most of this assumes you're familiar with openwrt and basic networking concepts :!:

:!: TODO(risk): in progress :!:

High availability

High availability is a term that can be used to refer to systems that are designed to remain functional despite some hardware and/or software failures and/or planned maintenance (e.g. upgrades). Actual measured availability (e.g. percentage of time or requests that succeed) can vary.

In this howto, we'll be describing a simple 2 router setup, in an active/backup configuration. The devices will share a virtual ip address that hosts on the lan can use as a gateway to reach the internet. In case the active router fails or is rebooted, a backup router will take over.

We will be using keepalived to implement healthchecking and ip failover, and conntrack-tools to implement firewall/nat syncing.

Most of openwrt configuration required (but not all) is doable from luci web ui as well.

Preparation, assumptions, description of environment

  • You have 2 openwrt routers and a static WAN IP. (could also be a private IP+DMZ).
  • If you're not doing NAT or connection tracking based firewalling, skip the conntrackd/conntrack-tools sections.
  • DHCP dynamic WAN IP is possible with keepalived, but requires extra scripting and is not going to be described here.
  • VPNs and tunnel setups and failing those over is not covered.
  • Failing over PPPoE WAN is not implement, best bet: let the modem do PPPoE and setup your virtual wan ip to DMZ.

Individual Router Configuration

1. Configure 1st openwrt router

  • Internal LAN ip: 192.168.1.2/24 (change so 192.168.1.1 is available for initial configuration of 2nd router)
  • WAN IP, gateway: static 192.168.0.2/24 gw 192.168.0.1 metric 10 (using double nat / dmz on the isp provided router)
  • DHCP on defaults is fine, we'll configure it later.

2. Configure 2nd openwrt router

  • Interface LAN ip: 192.168.1.3/24 (change so that when you connect the second router to the same network you can configure it)
  • WAN IP, gateway: static 192.168.0.3/24 gw 192.168.0.1 metric 10 (using double nat / dmz on the isp provided router)
  • DHCP on defaults is fine for now, if you have any static leases in dhcp, or fixed host entries, make sure they're the same as on 1st router.
verification and troubleshooting
  • change a client to use gw 192.168.1.3 and dns 192.168.1.3, make sure second router is working as well
  • hosts that have IPs issued with one dnsmasq might not be resolvable using the second dnsmasq, assigning static leases helps.

Both router configuration

3. Configure keepalived

keepalived is a linux daemon that uses VRRP (Virtual Router Redundancy Protocol) to healthcheck and elect a router on the network that will serve a particular IP. We'll be using a small subset of its features in our use case.

opkg update opkg install keepalived

The following configuration in /etc/keepalived/keepalived.conf assumes routers are symmetrical, ie. they're of the same priority, they start up in backup mode and they will not preemept the other router until they establish other router is gone. You will need to adjust the interfaces to match your device.

! Configuration File for keepalived

! failover E1 and I1 at the same time
vrrp_sync_group G1 {
  group {
    E1
    I1
  }
}

! internal
vrrp_instance I1 {
  state backup
  interface br-lan
  virtual_router_id 51
  priority 101
  advert_int 1
  virtual_ipaddress {
    10.9.8.4/24
  }
  authentication {
    auth_type PASS
    auth_pass s3cret
  }
  nopreempt
}

! external
vrrp_instance E1 {
  state backup
  interface eth0.2
  virtual_router_id 51
  priority 101
  advert_int 1
  virtual_ipaddress {
    192.168.0.4/24
  }
  virtual_routes {
    src 192.168.0.4 to 0.0.0.0/0 via 192.168.0.1 dev eth0.2 metric 5
  }
  authentication {
    auth_type PASS
    auth_pass s3cret
  }
  nopreempt
}

4. Configure conntrackd

This step is optional, keepalived will be failing over (successing over?) the ip address with or without conntrackd, however, as NAT relies on tracking connection state in a (network address) table that links external ip:port with internal ip:port (per given protocol, tcp or udp), connections might be broken on failover to backup openwrt instance. New connections (such as application level reconnects) will work just fine. This is because the backup instance will not know who to send outgoing packets to.

Below is a simple config file for conntrackd. It would be advisable to navigate to /etc/conntrackd/ in order to rename the original config. Creating a brand new "conntrackd.conf" file allows you to browse back to the old one for reference.

Sync {
    Mode FTFW {
        DisableExternalCache Off
        CommitTimeout 1800
        PurgeTimeout 5
    }

    UDP {
        IPv4_address "ip addr of host router"
        IPv4_Destination_Address "ip addr of partner router"
        Port 3780
        Interface eth*
        SndSocketBuffer 1249280
        RcvSocketBuffer 1249280
        Checksum on
    }
}

General {
    Nice -20
    HashSize 32768
    HashLimit 131072
    LogFile on
    Syslog on
    LockFile /var/lock/conntrack.lock
    UNIX {
        Path /var/run/conntrackd.ctl
        Backlog 20
    }
    NetlinkBufferSize 2097152
    NetlinkBufferSizeMaxGrowth 8388608
    Filter From Userspace {
        Protocol Accept {
            TCP
            UDP
            ICMP # This requires a Linux kernel >= 2.6.31
        }
        Address Ignore {
            IPv4_address 127.0.0.1 # loopback
        }
    }
}

Run simple commands to verify functionality

Summary of connected devices:

conntrackd -s

Resync nodes:

conntrackd -n

3. Configure dhcp

You'll want DHCP (dnsmasq) to serve 192.168.0.4 (vip address) to hosts on the lan, both as their gateway and DNS. Here's an excerpt from /etc/config/dhcp that instructs dnsmasq to do that.

...
config dhcp 'lan'
        ...
        option force '1'
        list dhcp_option '3,192.168.1.4'
        list dhcp_option '6,192.168.1.4'
...
option force '1' is needed for dnsmasq to not deactivate when it sees the other dhcp server. dhcp_option 3 is gateway, dhcp_option 6 is DNS.

5. Sysupgrade backup add dirs

Add the following directories to /etc/sysupgrade.conf. (can be done from luci as well).

...
/etc/keepalived/
/etc/conntrackd/

Testing and verification

TODO(risk): restarting keepalived with logread -f open, pulling cables with ssh / telnet / http sessions open, forcing dhcp renewal with tcpdump running, ensure

doc/recipes/high-availability.txt · Last modified: 2017/01/25 02:17 by aaronhauck