Pseudowire
In today's data center an important question is: How to react on server outages? One of the possible answer to this question is a second data center in another location with a symmetric internet connection. This second data center can be run on cheap servers. With this setup, data from the primary data center can continuously be transmitted to the secondary one. If one or more server in the primary data center fail (e.g. because of hardware defect) the second data center can be used as fallback. This would be possible, if both of the data centers could be connected on layer 2 so they both share the same address range simultaneously. With expensive, proprietary hardware (e.g. Cisco) this is easily possible. For exactly this problem pseudowire switching can be used as seen in the following picture.
proprietary setup: http://isc.sans.edu/diary.html?storyid=8704
Linux also supports Layer 2 coupling with VPN1 or unencrypted via the Internet. Examples therefor are:
- OpenVPN - in bridged to bridged mode
To achieve this the following options are used on the server side:
server-bridge ....
up "/etc/openvpn/bridge-start"
dev tap0
In the remote data center one client is provided with two VLANs. The first is used to connect
to the vpn server. The second one is used later on when routing the layer 2 traffic of the first
data center. In this example this is VLAN 111, which is available on the client as bridge.
up "/etc/openvpn/bridge-start" client # /etc/openvpn/bridge-start #!/bin/bash ################################# # Set up Ethernet bridge on Linux # Requires: bridge-utils ################################# # Define Bridge Interface br="vlan111" # Define list of TAP interfaces to be bridged, # for example tap="tap0 tap1 tap2". tap="$1" for t in $tap; do /usr/sbin/openvpn --mktun --dev $t done for t in $tap; do /usr/sbin/brctl addif $br $t done for t in $tap; do /sbin/ifconfig $t 0.0.0.0 promisc up done
The advantage of routing the layer 2 traffic in the unused VLAN 111 has the advantage that local traffic does not mix with the traffic of the remote data center. A disadvantage hereby is the somewhat smaller throughput due to the architecture of openv- pn, as to many interrupts are required to copy the data vom kernel space to user space and back. This has a significant role when working with embedded hardware.
- rbridge - http://www.inlab.de/rbridge/index.html
Here an UDP tunnel in both directions is created to connect two ehternet segments. An
encryption of this tunnel can be implemented by using IPSEC on top of this tunnel. This
program, however, is not available as source code and therefor its usage is limited.
- Similar solutions
- etherip http://lwn.net/Articles/119535/
L2TPv3 with OpenWrt
So, is L2TPv3 available for Linux? Since Kernel 2.6.35 L2TPv3 is rudimentary available. Further infos in http://kerneltrap.org/mailarchive/linux-netdev/2010/4/2/6273948. So L2TPv3 is available for Linux. So what else is to be done to get this working:
- install Linux
- confgiure the firewall
- set up the IPSEC Tunnel
- monitor the IPSEC Tunnel
- start the L2TPv3 Tunnel and bridge it on both ends
No Linux distribution has included L2TPv3 in their network setup - but OpenWrt. With OpenWrt it
can be configured in the network configuration in /etc/config/network
Have s look at http://wiki.openwrt.org/doc/uci/network#protocol.l2tp.l2tp.pseudowire.tunnel
Openwrt can be used on many other hardware besides routers from Linksys. Even para virtual network devices are available to achieve performant network operations in virtualised environ- ments:
- KVM - Kernel Based Virtual Machine
- ESX/ESXI - VMware
- Virtual Box - Oracle
Openwrt is small, can be easily adopted and has an excellent buildchain. With this buildchain personal modifications are simple to include.
Implementation
After the theoretical depiction follows a practical implementation using an example network environment. The used network ranges are:
- external IP address in the primary data center (141.64.161.74), gateway (141.64.161.1), netmask (Class C)
- internal IP range 10.1.0.0/24
- external IP address in the secondary data center (192.166.120.139), gateway (192.166.120.1), netmask (Class C)
- unused VLAN 111 in the secondary data center
This shows the implementation of the Layer 2 coupling. First, the IPSEC tunnel between 141.64.161.75 to 192.166.120.139 is established. After succeeding, the IPs 192.168.202.5/32 and 192.168.202.9/32 are assigned to the tunnel endpoints. On top of this tunnel the L2TPv3 tunnel can be initialized. The IPSEC tunnel is needed as the connection between the data centers has to be cryptographically secure. This encryption is implemented in the kernel and can be implemented in hardware so latencies should be small and throughput should be huge. L2TPv3 is kernel based, too, which is one of the features that make it appealing in using to achieve the aspired goals.
Implementation using OpenWrt
To implement this setup with OpenWrt in both data centers an instance (l-01 in the primary, l-02 in the secondary) has to be configured as follows.
- l-01
/etc/config/network
config interface wan option ifname eth0 option type bridge option proto static option ipaddr 141.64.161.74 option netmask 255.255.255.0 option gateway 141.64.161.1 config interface lan option ifname eth1 option type bridge option proto l2tp option ipaddr 10.1.0.1 option netmask 255.255.255.0 option encap udp option sport 1701 option dport 1702 option localaddr 172.30.201.5 option peeraddr 172.30.201.9
''/etc/config/firewall'' config rule option src wan option proto gre option target ACCEPT
config rule option src wan option proto esp option target ACCEPT
config rule option src wan option proto ah option target ACCEPT
config rule option src wan option dest_port 500 option proto udp option target ACCEPT
config rule
option src wan
option dest_port 4500
option proto udp
option target ACCEPT
''/etc/ipsec.conf''
#OpenSwan is being used
conn to-secondary
type=tunnel
left=141.64.161.74
leftnexthop=141.64.161.1
leftsourceip=192.168.201.5
leftsubnet=172.30.201.5/32
leftid="primary@rz"
right=192.166.120.139
rightnexthop=192.166.120.1
rightsourceip=192.168.201.9
rightsubnet=192.168.201.9/32
authby=secret
auto=start
ike=aes128-sha-modp1024
esp=aes128-sha1
''/etc/monitrc'' check process pluto with pidfile /var/run/pluto/pluto.pid start program = "/etc/init.d/ipsec restart" stop program = "/etc/init.d/ipsec restart" # check host secondary with address 10.1.0.2 if failed icmp type echo count 5 with timeout 30 seconds then exec "/sbin/ifup lan"
- l-02
/etc/config/network
config interface wan option ifname eth0 option type bridge option proto static option ipaddr 192.166.120.139 option netmask 255.255.255.0 option gateway 192.166.120.1
config interface lan option ifname eth1 option type bridge option proto l2tp option ipaddr 10.1.0.2 option netmask 255.255.255.0 option encap udp option sport 1702 option dport 1701 option localaddr 172.30.201.9 option peeraddr 172.30.201.5 ''/etc/config/firewall'' same as l-01
''/etc/ipsec.conf'' #OpenSwan is being used conn to-primary type=tunnel left=141.64.161.74 leftnexthop=141.64.161.1 leftsourceip=192.168.201.5 leftsubnet=172.30.201.5/32 rightid="secondary@rz" right=192.166.120.139 rightnexthop=192.166.120.1 rightsourceip=192.168.201.9 rightsubnet=192.168.201.9/32 authby=secret auto=start ike=aes128-sha-modp1024 esp=aes128-sha1
''/etc/monitrc'' check process pluto with pidfile /var/run/pluto/pluto.pid start program = "/etc/init.d/ipsec restart" stop program = "/etc/init.d/ipsec restart" # check host secondary with address 10.1.0.1 if failed icmp type echo count 5 with timeout 30 seconds then exec "/sbin/ifup lan"
Explanations
The functionality of both the IPSEC and the L2TPv3 tunnel are assured by the use of the daemon monit on both ends. If one of the tunnels is destroyed an ifup lan is triggered to restart the connection. In the secondary data center eth1 has to be in VLAN 111. Because of this, the internal network of the primary data center can be used without the interference of layer 2 noise. This means the internal network of the primary data center is bridged in the VLAN 111 in the secondary data center.
Bridging of devices with varying MT
For now the setup works so both sides can ping each other. An ssh connection, however, is not either possible or freezes after a few seconds. The reason: The bridge for the L2TPv3 contains devices with different MTU3 . Furthermore, as the connection is bridged no routing happens and the MTU is not automatically adjusted by the router. All devices in the LAN usually use a MTU of 1500. The MTU of the L2TPv3 devices is about 1400. As the tunnel itself can not fragment packets all packets bigger than the MTU are lost. This happens at longer HTTP request, too. To solve the problem bridge firewalling and TCP MSS Clamping is used. Bridge firewalling means the iptables rules are used when a packet passes a bridge. Normally this should not work, as a bridge only works in Layer 2. However, if bridge firewalling is enabled in the kernel a bridge can work in Layer 2 as well as Layer 3. The following sysctl keys are used for this:
- net.bridge.bridge-nf-call-iptables
- 0 - do not send IPv4 traffic through iptables
- 1 - do send IPv4 traffic through iptables
- net.bridge.bridge-nf-filter-vlan-tagged
- 0 - do not send vlan tagged Ipv4 traffic through iptables
- 1 - send vlan tagged Ipv4 traffic through iptables
IPv6 have to be configured separately.
For the example setup the rules should be like this:
iptables -I FORWARD -s 10.1.0.0/24 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1400 iptables -I FORWARD -d 10.1.0.0/24 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1400 iptables -I FORWARD -s 10.1.0.0/24 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu iptables -I FORWARD -d 10.1.0.0/24 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtuThese rules should be narrowed as good as possible to avoid unpleasent side affects.
— Have fun: Thomas 2011/02/20 18:41
doc/howto/pseudowire.txt · Last modified: 2013/03/15 11:02 by alessandro_s
This text is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.


