The Linux Packet Scheduler

An Introduction to Traffic Control / Traffic Shaping / Traffic Prioritizing / Bandwidth Management

Traffic Control is the umbrella term for packet prioritizing, traffic shaping, bandwidth limiting, AQM (Active Queue Management), etc. This howto shall help you set up such on your GNU/Linux system. The only OpenWrt specific thing about this, is that OpenWrt splits much stuff into small opkg-packages to save on storage and memory.

On other Linux distributions the tools presented here are all together in one software package called iproute2. The official (but out-of-date) description of iproute2 can be found at LARTC (Linux Advanced Routing & Traffic Control).

1. You can control, i.e. prioritize and/or shape, ANY upload traffic, i.e. traffic being send from your router to the Internet. Doing so will solve problems with congestion or Jitter or dealy of packets
2. By pure logic you cannot achieve the same, for your download traffic. However, you can POLICE the download traffic. By policing the download, you can still achieve noticeable improvements with incomming traffic as long as UDP traffic does not flood the DOWNload line
TCP (Transmission Control Protocol) is a protocol that allows the adaptation of the sending rate to the actual charge state of the network. This is achieved by droping or delaying ACK-Packets.
UDP (User Datagram Protocol) does not have ACK-packets and thus cannot adapt. So if UDP packets congest your DOWNLOAD bandwidth, there is NO scheduler in the world, that could help with that.

Also wurde ein Protokoll geschaffen, das die Sende­rate an den aktuellen Lastzustand im Netz an­pas­sen kann. Das heutige Transmission Control Protocol (TCP) war geboren.

Why Traffic Control?

Let's have a look at the upload traffic:

without tc with tc
bild1 bild2

"TCP-Turbo"

When you prioritize ACK-Packets on the upload side, your download TCP-traffic will benefit in certain situations (a marketing term for this is "TCP-Turbo"):

without tc with tc
without traffic control with traffic control

NOTE: Download UDP traffic remains unchanged by this!

Traffic Policing

Under ideal circumstances everyone could upload a traffic control configuration to his Broadband Remote Access Server/ Access Concentrator and achieve the same for the download-direction as for the upload. But in reality nobody has something like this, and the queue on the BRAS is probably huge. To avoid this, avoid the accumulation of packets in the TX-Queue of the BRAS, you can enable policing on your OpenWrt-Router for the download. Its not much, but probably better then doing nothing!

Preparations

Prerequisites

  1. read about the working principle of the Linux packet scheduler; this is really very simple
  2. understand the available configuration parameters and their effect of the queueing algorithm you want to employ; this can get quite complicated
  3. learn the utilization of tc; again, very simple and straight forward
  4. muster the patience to adjust your configuration. For the packet scheduler to work effectively, your configuration parameters must be tailored to your actual available bandwidth and your usage scenario. A minimum viable script could be created in no more then half an hour, but most users will spend a couple of hours of tinkering and reading to understand the far-reaching possibilities and adapting the configuration to their needs and desires.
  5. You have to know the parameters of the line your are configuring the packet scheduler for
    • whether it is a full-duplex or a half-duplex line (cf. Duplex (telecommunications)
    • the accurate (=true and precise) available upload bandwidth!!!
    • The Linux packet scheduler works on Layer 2, thus you should always work with the actual bandwidth for the Layer-2-Payload:
    • e.g. when you employ the Layer 1 protocol "BASE100-TX":
      • you have 100MBit/s of theoretically available physical bandwidth
      • but due to interference or cabeling not adhering to specifications or faulty IC or whatever, you could have less then 100MBit/s of real/actual physical bandwidth!
      • on top of the layer 1 protocol, you will utilize a Layer 2 protocol, this adds protocol overhead. The Layer 2 protocol "Ethernet" adds about 2,5%, so a maximum of 97,5MBit/s remain for the Layer-2-payload.
      • on top of the layer 2 protocol, you will utilze at least one Layer 3 protocol, let's say IPv4. Due to protocol overhead, the Layer-3-Payload is smaller then the Layer-2-Payload. Ignore the Layer-3-Payload and always work with Layer2. If an application shows the used bandwidth or allows you to throttle the bandwidth it sends or receives data, it probably refers to the Layer-4-Payload. Or not. I have no idea. Just avoid mixing Layer-2-bandwidth with Layer-1 or Layer-3 bandwidth ;-)
    • e.g. when you buy VDSL 50000/10000, what do the values refer to? 10MBit/s physical bandwidth or Layer-2 bandwidth? However, you still have to measure the available one ;-)
    • e.g. you employ IEEE 802.11n-hardware with 300MBit/s theoretical physical bandwidth. You have to measure the real physical bandwidth, which will most probably vary over time and then substract the protocol overhead and work with that value.

Required Packages

  • tc (traffic control, user space program to configure the Linux packet scheduler)
    • kmod-sched (dependency of tc), package containing all schedulers (aka queueing disciplines aka QDiscs) available
  • iptables-mod-ipopt optional! Contains some matches and TARGETs for iptables: CLASSIFY, dscp/DSCP, ecn/ECN, length, mac, mark/MARK, statistic, tcpmms, tos/TOS, ttl/TTL, unclean
    • kmod-ipt-ipopt (user space module; dependency of corresponding user space module;
  • iptables-mod-* optional! (user space modules for iptables)
    • kmod-ipt-* (kernel space modules for iptables)
  • l7-protocols optional! If you want to match Layer 7 content
  • l7-protocols-testing optional! If you want to test. Check the projects own Homepage.

As long as your ISP does not give you access to the DSL-AC so you can install a simple TC-script, you will have to settle with policing the download:

  • kmod-ifb and act_connmark
In r25641 iptables-mod-imq (Intermediate Queueing Device) was removed and is not supported any longer. It's successor is kmod-ifb. See Intermediate Functional Block device

Installation

opkg

opkg update
opkg install tc iptables-mod-ipopt

Since the description of kmod-sched (kmod-sched is a dependency of tc) does not contain any information regarding its content, after installation do

ls -leha /lib/modules/$(uname -r)/ | grep sch
for a list of the currently installed QDisc modules. To use a particular one, you need to load the kernel module into memory:
insmod sch_hfsc
You need to do this after every reboot. For a list of currently loaded kernel modules and to remove a module do
lsmod
rmmod sch_hfsc

After thoroughly reading this wiki page, you are going to write a shell script, you are going to write a shell script, which will invoke tc a couple of times and configure the packet scheduler. Please also see the available examples. When everything works good enough, proceed with Start on boot and Hotplug.

Explanation

Configuration

Part 1 – Hierarchy: Nesting of qdiscs & classes

There are two types of scheduling algorithms (QDiscs), classfull and classless. If you choose to employ a classfull root QDisc, you will be able to tailor the configuration very closely to your needs, by constructing a hierarchy of "nesting entities" and then further tune each branch of the tree separately.

Tc it the one and only user space program available to set up, maintain and inspect the configuration of the Linux packet scheduler. What iptables, ip6tables are for netfilter, tc is for the Linux packet scheduler. Generally only one change is made to the packet scheduler each time tc is executed. A small shell script containing multiple invocations of tc are required to achieve a meaningful overall configuration:

nesting configuration
tc what command interface parent qdisc-id classid QDisc QDisc specific parameters
tc qdisc add dev eth0 root handle 1: hfsc default 20
::: ::: change dev eth0 root handle 1: hfsc default 20
::: ::: replace dev eth0 root handle 1: hfsc default 20
::: ::: link dev eth0 root handle 1: hfsc default 20
::: class add dev eth0 parent 1: classid 1:1 hfsc ls rate 750kbit ul rate 1000kbit
::: ::: change dev eth0 parent 1:1 classid 1:10 hfsc ls rate 250kbit ul rate 1000kbit
::: ::: replace dev eth0 parent 1:1 classid 1:20 hfsc ls rate 250kbit ul rate 1000kbit

Question No1: Why are there two types of nesting entities? What the fuck is the difference between a qdisc and a class??? Wouldn't it be enough to have qdiscs only and be done with it?
Answer: Bear in mind that we have the packet scheduler, which is a component like netfilter, and then we have specific scheduling algorithms working within the packet scheduler. These scheduling algorithms are usually called queueing disciplines or QDiscs. And in case you decide to employ a classful QDisc (scheduling algorithm), you have two nesting entities with slightly different traits: qdiscs and classes. Don't confuse the nesting entity qdisc from "qdiscs and classes" with QDisc aka Algorithm! ;-) To distinguish between the two types, I usually write qdisc and QDisc, but we should rather call the one scheduling algorithm and the other one qdisc. As long as you do not intend to use a classfull QDisc, you do not have to bother with the difference between qdiscs and classes!

Question No2: But I want to understand the difference.
Answer: Well, if you really insist, try reading classfull scheduling algorithm. This should make the necessity to distinguish between two types of nesting entities "qdiscs" and "classes" clear to you.

  • A chosen implementation that matches your situation best. Alone, use neither. Use classless.

Part 2 – Qdisc (Packet Scheduling Algorithm)

Once you decide how your entire configuration will look like, you have to look up the specific configuration of the QDisc algorithm you intend to use. Each Qdisc aka Scheduling Algorithm gives you parameters to tune:

Queueing Discipline Classfull Description
sch_atm name ?? bla
sch_blackhole Black hole queue ?? bla
sch_cbq Class-Based Queueing discipline very complex
sch_choke CHOKe scheduler ?? bla
sch_drr Deficit Round Robin scheduler ?? bla
sch_dsmark Differentiated Services field marker ?? bla
sch_esfq Enhanced Stochastic Fairness Queueing ?? removed in mainline kernel, but still available in OpenWrt
sch_fifo The simplest FIFO queue ?? bla
sch_generic Generic packet scheduler routines ?? bla
sch_gred Generic Random Early Detection ?? bla
sch_hfsc Hierarchical Fair Service Curve link sharing and low delay at the same time
sch_htb Hierarchy Token Bucket easiest configuration of link sharing, derived from CBQ
sch_ingress Ingress qdisc ?? bla
sch_mq Classful multiqueue dummy scheduler ?? bla
sch_mqprio name ?? bla
sch_multiq name ?? bla
sch_netem Network emulator Drop, delay, bla packets
sch_pfifo_fast FIFO with prioritizing DEFAULT, usually build-into the kernel
sch_prio Simple 3-band priority scheduler allows packet prioritization
sch_qfq Quick Fair Queueing Scheduler ?? bla
sch_red Random Early Detection bla
sch_sfb Stochastic Fair Blue ?? bla
sch_sfq Stochastic Fairness Queueing distibutes bandwidth for known tcp-connections fairly
sch_tbf Token Bucket Filter limit bandwidth
sch_teql True/Trivial Link Equalizer ?? bla

Note: The PRIO QDisc does contain three classes, but since they cannot be configured further, PRIO is considered to be a classless QDisc. Its classes are sometimes called bands.

Part 3 – Filters

This is where you configure which network packet belongs to which queue/bucket. A rule used to allocate a group of IP packets to a certain classid consists of a number of classifiers (match) and one connected action (TARGET or VERDICT). In principle it works exactly like netfilter rules, the only difference is that matches are called classifiers and the TARGET are called VERDICT in available documentation. However, since it is possible to do the filtering entirely with netfilter, this does not really matter.

Filter with packet scheduler

A filter is used by a classfull QDisc to determine in which bucket a packet will be enqueued. Whenever traffic arrives at a class with subclasses, it needs to be classified. Various methods may be employed to do so, one of these are the filters. All filters attached to the class are called, until one of them returns with a verdict. If no verdict was made, other criteria may be available. This differs per qdisc.

location match verdict/target
tc what command interface target priority protocol filtertype [ filtertype specific parameters ] flowid
tc filter add dev eth0 parent 1: prio 10 protocol ip u32 match ip dport 22 0xffff classid 1:202
::: ::: change dev eth0 parent 1: prio 20 protocol ip u32 match ip dport 22 0xffff classid 1:202
::: ::: replace dev eth0 parent 1: prio 99 protocol ip handle 202 fw flowid 1:202

Rule No1: It is important to notice that filters reside within QDiscs - they are not masters of what happens. hä?
Rule No2: A filter always belongs to a qdisc and never to a class!

Note1: packet scheduler classifying is slower then netfilter classifying!
Note2: If you are using NAT, you cannot use the packet scheduler to filter for the source IP address of different internal hosts, because they are being replaced with the router's external IP address before the packets enter the packet scheduler!

Filter with packet scheduler and netfilter

Using iptables and tc filter. deprecated? We first match wanted packets with netfilter and mark them, then match the mark (handle 202) and connect it with a certain classid (flowid 1:202):

iptables -t mangle -A POSTROUTING -j MARK --set-mark 202 -p udp --dport 22
tc filter add dev pppoe-dsl parent 1: prio 1 protocol ip handle 202 fw flowid 1:202

Filter with netfilter only

It is possible, more efficient and comes with the most options to use netfilter to match and then directly classify network packets:

iptables -t mangle -A POSTROUTING -j CLASSIFY --set-class 1:202 -p tcp --dport 22

For netfilter there is a module called layer7 (install the package l7-protocols.). Deep packet inspection can be slow:

iptables -t mangle -A POSTROUTING -j CLASSIFY --set-class 1:305 -m layer7 --l7proto xxx

Here we match the combination of source IP address, transport protocol, destination port and packet (not payload) length:

iptables -t mangle -A POSTROUTING -j CLASSIFY --set-class 1:303 -s 192.168.0.15 -p tcp --dport 80 -m length --length :512

You may read on the internet that you can use target CLASSIFY only on POSTROUTING, but it's not true since at least 2006, you can also use it on FORWARD and OUTPUT. From kernel 2.6.39, you are no longer restricted to the mangle table, and can classify with arptables (on OUTPUT and FORWARD)(http://comments.gmane.org/gmane.comp.security.firewalls.netfilter.devel/36340).

Approach to an own configuration

The configuration of the packet scheduler has to be tailored to your situation. Bear in mind what you are actually doing here: you adjust the behavior of the packet scheduler working the egress buffer of one certain physical software interface! Depending on the employed algorithm, we can:

  1. manipulate the order in which the packets, that currently are in the egress buffer are being sent (=re-order/prioritize)
  2. subdivide the buffer into sub-buffers, and then drop packets willfully, when they fill their sub-buffer (=traffic shaping)

This is helpful in certain situations:

  • traffic prioritization helps with problems with jitter that occur when there are buffers and these get clogged
  • traffic shaping helps with dividing the available bandwidth between defined traffic types and/or participants willfully

So, let's check your situation and let's then configure your packet scheduler accordingly:

  1. do you require traffic control?
    1. IF you generate more traffic then can go through the line, THEN simply stop this foolish behavior. Stop generating more traffic then your upload bandwidth. There is no point in generating more traffic than there is available upload bandwidth. This will only clog you egress buffer and cause serious problems with jitter.
    2. IF your aunt Margarette is responsible for the excess traffic, and you cannot influence this, THEN yes
    3. IF you do not generate more traffic then can go through the line, but still have problems with Jitter, THEN you could profit from traffic prioritization
  1. What can you configure?
    1. You can determine the behavior of the packet scheduler through exactly three choices: the nesting (only in case the root qdisc is a classfull one), the particular algorithm(s) to be applied and particular parameters of the employed algorithm(s)
  2. are you alone or is there traffic generated by multiple users at the same time?
    1. in case you are alone, the configuration could look very simple. See →Examples.
    2. in case of multiple users, there a couple of methods only for the nesting.
  3. what kind of traffic is being generated?
    1. Given that fact that the packet scheduler can only do so much, it makes sense to distinguish between exactly only two types of traffic: traffic susceptible to jitter and time delay, and traffic that is not! Yes, can subdivide this two types further, but whether this makes sense, depends on the employed algorithm, on how full the egress buffer is and on your available upload bandwidth.
  • classfull or classless? → implementation that matches your situation best. Alone, use neither. Use classless.

Examples

In my eyes google finds far to few complete examples for implementations. Let us try to keep them as modular and simple as possible: (use Vim for a better syntax highlighting)

Note: The above examples do not make any use of UCI or anthing else, that is OpenWrt-specific, so you can simply port them to any other Linux distributions and back.

Check results

To check on your results, use tc with or without the option -s (statistics):

 tc -s qdisc show dev pppoe-dsl

 tc class show dev pppoe-dsl

 tc filter show dev pppoe-dsl

 iptables -nL -v -x -t mangle

Testing

Once you managed to set up a working configuration, you need to test it. Thoroughly. Produce all kind of outgoing traffic and measure the bandwidth distribution. Then, measure the packet delay. An ideal set up would be full access and control over the three: Source → line1 → Router → line2 → Destination. The whole effort is needed because the source(s) send more traffic then line2 can handle. To simulate a limited dsl-line over an Ethernet connection do… :!: TODO :!: But even without this set up, you can do measurements:

To measure and compare our results, the approach is always the same. We measure the latency of network packets in different situations:

  1. on free line without any QoS; the technical feasible values; less is technically impossible
  2. on congested line without QoS; you will have huge and also unsteady delays. (This is very reason you want to configure your egress buffer.)
  3. on free line with any QoS; you will add some delay, there is no way around this; the lesser the better
  4. on congested line with QoS; the fruits of all your work: more delay compared to a free line, but not much; the lesser the better

Googleling is time consuming and sometimes fruitless. If you got a good and working result please come back and share you knowledge!

If you have a working statistic thing, it will be of invaluable help with this.

Start on boot

Make init restart your script every boot up. vi etc/init.d/trafficc

#!/bin/sh
 
START=50
 
boot () {
        /etc/tc_hfsc.sh start
}
 
start() {
        /etc/tc_hfsc.sh start
}
 
stop()  {
        /etc/tc_hfsc.sh stop
}
chmod a+x /etc/init.d/traffic sh /etc/init.d/traffic start sh /etc/init.d/traffic enable

Hotplug

For example the virtual network interface pppoe-dsl has a special behavior. If you reconnect your dsl-connection, the device pppoe-dsl will be closed down, thus it will "seize to exist". And so will its QDisc. So every time, it is been brought up again, the whole configuration will need to be set up again. A way to achieve that is described here.

Make hotplug restart your script every time the interface the packet scheduler belongs to comes up again:

vi /etc/hotplug.d/iface/30-trafficc

#!/bin/sh
# This script is executed as part of the hotplug event with
# HOTPLUG_TYPE=iface, triggered by various scripts when an interface
# is configured (ACTION=ifup) or deconfigured (ACTION=ifdown).  The
# interface is available as INTERFACE, the real device as DEVICE.

[ "$ACTION" = ifup -a "$INTERFACE" = "dsl" ] && /etc/init.d/trafficc enabled && /etc/tc_hfsc.sh

Statistical Data

Once your configuration is up and running, you may want to collect some statistical data:

  • about bandwidth used by the different classes
  • packets dropped (!)
  • number of packets, packets size, which protocol was being used, source IP, …
  • the data tc and iptables dispense is of course not sooo well formated.
  • use a tool to collect and parse data: NGN

NOTE: If you do not log only your own traffic data, please mind data privacy protection laws to prevent you from going to jail or paying a fine.

Troubleshooting

  • The packet scheduler comes after netfilter and it is possible to block traffic, if you configure it badly. To avoid this, the default behavior of the packet scheduler should be to have some bandwidth available for all traffic that has no specific configuration! In contrary to the packet filter, where the default behavior should be DROP.
  • A common mistake can be forgetting to classify ARP packets (even if you match all packets in iptables, you won't match ARP packet, as iptables is layer 3 and ARP is layer 2)
  • If you're adding you qdisc on a "virtual" dev (vlan (eth0.1), bridge (br-wan)) you may have dropped packet, it's because for these type of device the default txqueuelen is 0 and the qdisc inherit this value. Simply set an higher value: "ifconfig br-wan txqueuelen X" (the value of x depends on the speed of your link, you can try with 1 or txqueuelen of the real dev (eth0)).

Notes

Back to top

doc/howto/packet.scheduler/packet.scheduler.txt · Last modified: 2012/03/27 15:58 by orca