A TCP Vegas Implementation for Linux

Neal Cardwell
Graduate Student Researcher

Boris Bak
Undergraduate Research Assistant
bakb@alumni.washington.edu

This work was supported by USENIX. kel

Introduction

TCP Vegas is a congestion control algorithm that reduces queuing and packet loss, and thus reduces latency and increases overall throughput, by carefully matching the sending rate to the rate at which packets are successfully being drained by the network. Vegas was originally developed at the University of Arizona in the x-kernel protocol framework by Lawrence Brakmo and Larry Peterson. This page describes a Vegas implementation for Linux 2.2/2.3. This implementation can be enabled, disabled, and configured through entries in the /proc filesystem.

Like most every TCP congestion control algorithm, Vegas is purely a sender-side algorithm. Enabling Vegas will help if you send a lot of data (e.g., you are running a web server), but not if you mostly just receive data (e.g., you're browsing the web). I encourage you to try Vegas even if you don't send much data, as this will help expose any bugs or performance problems that may be lurking in the Vegas code.

Status

May 10, 2004

This TCP Vegas implementation has been incorporated into the official Linux 2.6.6 source release, so you can find the latest version at kernel.org in net/ipv4/tcp_vegas.c.

Feb 2002

I'm putting this out on the web again. I haven't touched this in more than two years, but would still be interested in hearing from people who try it out, particularly if they update the patch to work for more up-to-date kernels.

Aug 1999

The implementation is stable (at least in our configuration) and fairly well tested. Right now I'm mostly looking for eyeballs to give the code a once-over and feedback about the performance folks see when they give Vegas a try.

Earlier Patches

linux-vegas-v2-patch for 2.2.x (July 20, 1999) (This was originally a patch against linux-2.3.10, though it works against 2.3.11-2.3.12 and 2.2.10-2.2.12 as well.)
linux-vegas-v2-patch for 2.3.x (Aug 27, 1999) (This works against 2.3.15.)

NOTE: As with any kernel patch, use at your own risk, we make no guarantees, etc. However, in our experience, this implementation is very stable.

If you have any comments or patches, or Vegas improves or degrades your TCP performance significantly, I'd be interested in hearing about it; please send mail to:

If you encounter performance problems, please send a pointer to a sender-side tcpdump of a Vegas transfer if possible.

Background

Vegas is described in detail in:

"TCP Vegas: End to end congestion avoidance on a global Internet," Lawrence S. Brakmo and Larry L. Peterson. IEEE Journal on Selected Areas in Communication, 13(8):1465--1480, October 1995
"Experience with TCP Vegas: Emulation and Experiment." J.-S. Ahn, P. B. Danzig, Z. Liu, and L. Yan. SIGCOMM '95.
"Analysis of TCP Vegas and TCP Reno," O. Ait-Hellal, and E. Altman, Proc. IEEE ICC'97, 1997.
"Comparison of TCP Reno and TCP Vegas via Fluid Approximation," T. Bonald. Workshop on the modeling of TCP, Dec. 1998.
"Analysis and Comparison of TCP Reno and Vegas," Jeonghoon Mo, Richard La, Venkat Anantharam, Jean Walrand. INFOCOM '99.
"TCP Vegas Revisited," Hengartner, U., Bolliger, J., and Gross, T., INFOCOM '00.
"A Case for TCP Vegas in High-Performance Computational Grids," Eric Weigle and Wu-chun Feng.
"Understanding Vegas: A Duality Model," S. H. Low, Larry Peterson and Limin Wang Journal of ACM, 49(2):207-235, March 2002

Implementation Overview

This Linux implementation was done by Neal Cardwell (a grad student) and Boris Bak (a recently-graduated undergrad) in the CSE department of the University of Washington-Seattle. The main aspects that distinguish our Linux implementation from the Arizona Vegas implementation are:

We do not change the loss detection or recovery mechanisms of Linux in any way. Linux already recovers from losses reasonably well, using an RTO derived from fine-grained RTT measurements and NewReno or FACK. Our implementation currently does not make Vegas adjustments during loss recovery.
To avoid the considerable performance penalty imposed by increasing cwnd only every-other RTT during slow start, we increase cwnd during every RTT during slow start, just like Reno.
Largely to allow continuous cwnd growth during slow start, we use the rate at which ACKs come back as the "actual" rate, rather than the rate at which data is sent.
We currently do not use any special heuristics to set ssthresh or exit slow start, other than the default Vegas approach. We're working on this; see "On Estimating End-to-End Network Path Properties" for details on why delayed ACKs and other factors make this a difficult problem.
To speed convergence to the right rate, we set the cwnd to achieve the "actual" rate when we exit slow start.
To filter out the noise caused by delayed ACKs, we use the minimum RTT sample observed during the last RTT to calculate the actual rate. This can delay the detection of congestion by up to one RTT or so, but we have not found a better way to filter out the huge RTT spikes caused by delayed ACKs.
We use microsecond-resolution time stamps rather than the millisecond-resolution time stamps used by the x-kernel or the 10ms-resolution time stamps used by Linux/i386. You need ms-resolution time stamps even for WAN paths -- 10ms doesn't work very well for paths with 100ms RTTs or smaller.
When the sender re-starts from idle, it waits until it has received ACKs for an entire flight of new data before making a cwnd adjustment decision. The original Vegas implementation assumed senders never went idle.
Our implementation currently does not deal with route changes. I'm working on this...

Here are some old slides (powerpoint, ps.gz) of a talk about our experiences cooking up this implementation. This talk was given at the Detour retreat, 6/15/1999.

Enabling Vegas

Kernels with this patch still use the original Linux congestion control (a traditional Jacobson-style/RFC-2581 congestion control algorithm) until you enable Vegas using:

echo 1 > /proc/sys/net/ipv4/tcp_vegas_cong_avoid

While Vegas is enabled, all new TCP connections use Vegas. Existing connections continue to use whatever algorithm they were using before.

To disable Vegas:

echo 0 > /proc/sys/net/ipv4/tcp_vegas_cong_avoid

Any new connections created after you execute this command will use the default algorithm. Any existing connections will keep using whatever algorithm they were using before.

Tuning Vegas

There are three entries in the /proc file system that control Vegas parameters:

The /proc/sys/net/ipv4/tcp_vegas_alpha and /proc/sys/net/ipv4/tcp_vegas_beta entries control the number of packets that Vegas attempts to keep queued in the network in steady state. They are expressed in half-packet units. The default configuration is that recommended by the original Vegas papers: alpha=1 packet and beta=3 packets. In this configuration Vegas tries to keep between 1 and 3 packets queued in the network and usually succeeds in stabilizing cwnd at a value that satisfies this constraint. This is the initial configuration; if you change these parameters, you can restore the default configuration later with:
    echo 2 > /proc/sys/net/ipv4/tcp_vegas_alpha
    echo 6 > /proc/sys/net/ipv4/tcp_vegas_beta
Another configuration that seems to work well is alpha=beta=1.5 packets:
    echo 3 > /proc/sys/net/ipv4/tcp_vegas_alpha
    echo 3 > /proc/sys/net/ipv4/tcp_vegas_beta
In this configuration Vegas oscillates, keeping around 0-2 packets in the network. This is nice because there will often be zero queuing delay, so that new Vegas flows will get an accurate notion of baseRTT; this will improve fairness between Vegas flows. For more details on this, see "Fairness and Stability of Congestion Control Mechanism of TCP," by Go Hasegawa, Masayuki Murata, and Hideo Miyahara, INFOCOM '99.
The /proc/sys/net/ipv4/tcp_vegas_gamma entry controls the number of packets that Vegas will allow to be queued in the network during slow start before it exits slow start. Again, this is expressed in units of half-packets. The default is gamma=1 packet, which you can also get with:
echo 2 > /proc/sys/net/ipv4/tcp_vegas_gamma

I haven't had much luck with anything but gamma=1 packet.

See the Vegas JSAC paper more details on these three parameters.

Instrumentation

If you want to see what is going on inside, turn on kernel logging with klogd -c 7 and then turn on tracing of the socket you're interested in with setsockopt(... SOL_SOCKET, SO_DEBUG...). Then, if the connection is using Vegas, it will write detailed trace output to /var/log/messages. Note that for high-speed connections, the log will often be missing many entries due to buffer wrap-around.

Example Traces

Some postscript plots of some example traces:

LAN: Sending from a 100Mbit Ethernet to a shared 10Mbit Ethernet: FACK Vegas+FACK
LAN: Sending across a switched 100Mbit Ethernet: FACK Vegas+FACK
512Kbps DSL: FACK Vegas+FACK
emulated 28Kbps link: FACK Vegas+FACK

Note that in each of these traces, the throughput is about the same for FACK and Vegas+FACK; the main difference is that Vegas+FACK is far less bursty than FACK and is much nicer to the queues, typically keeping only a few packets queued.

Measuring Performance

For measuring TCP performance, I recommend wget or netperf.

Preliminary Performance Numbers

To get a feel for the performance of this Vegas implementation, i performed 256Kbyte and 10MByte transfers from a Linux 2.3.10 sender at the University of Washington using this implementation to a few dozen hosts in the US and Europe. To each site I performed 6 256Kbyte transfers with Vegas and 6 without, and 4 10MByte transfers with Vegas and 4 without. Below are the cumulative distributions of bandwidth, retransmitted bytes, and the RTT experienced by TCP during the transfer, as determined by tcpdump packet trace analysis. All Vegas trials used alpha = beta = 1.5 packets; anecdotally, results seem similar for alpha = 1 packet, beta = 3 packets.

[There were a bunch of links to graphs here, but all the graphs were on a machine that died long ago.]

The basic result is that the Vegas implementation achieved bandwidths that were comparable in most cases, and slightly higher in a number of cases. The Vegas implementation usually retransmitted significantly fewer bytes and maintained smaller queues, as represented by the smaller RTTs. Vegas provides bigger improvements when there is a single flow going over a medium-bandwidth link, like my DSL link. The bandwidth gains for single Vegas transfers were typically small in these trials for a number of reasons:

The 256Kbyte transfers often experienced no packet loss, so that they spent their entire lifetime's in slow start mode, which is no different in our Vegas implementation.
Many of the longer transfers were limited by receiver windows rather than congestion and congestion control behavior.
The loss rates were low, and the NewReno loss recovery algorithm in Linux is fairly successful in preventing timeouts.
Over highly congested paths, competing Reno flows will steal bandwidth from Vegas flows, since Reno flows tend to drive the queue to saturation.

Preliminary Netperf Results

With SACK

Because the Linux FACK implementation usually does a very good job of keeping a path fully utilized even in the face of losses, turning on Vegas usually doesn't improve steady-state throughput much above FACK in the cases that I've been able to look at so far.

With 60-second netperf transfers from UW to Princeton, both FACK and Vegas+FACK got about 32-35Mbit/s. This is pretty good for a cross-country 95ms RTT path that presumably has a 100Mbit/s bottleneck. To take another example, with netperf running over an emulated 10Mbps, 100ms RTT, queue=100packets path (using dummynet), i clocked both FACK and Vegas+FACK at 9.59Mbps. There is still a huge difference in use of the queue: in a 120-second netperf trial over this emulated network, Vegas usually drops no packets and keeps between alpha and beta packets queued, whereas FACK drops about 100 packets (the entire queue's worth) during slow start, suffers 6 more losses over the rest of the transfer, and keeps the queue about half full on average (about 50 slots out of 100).

Without SACK

If you look at the same paths but with a receiver that doesn't do SACKs, adding Vegas does help performance.

With 60-second netperf transfers from UW to Princeton, disabling SACKs and thus forcing the sender to use NewReno-style loss recovery, performance is highly variable, and Vegas achieves significantly higher bandwidths:

In the emulated network, NewReno gets 8.59Mbps, while adding Vegas yields 9.59Mbps.

One factor that may be contributing to the low bandwidths and high variability is the very poor performance NewReno sees when it loses many packets in a single window during slow start, and must retransmit each packet at a rate of one packet per RTT (an example NewReno nightmare scenario with an 8-second "fast" recovery period).

Benchmarking Summary

Alltogther, i'd say the benchmark results indicate Vegas can help for low-bandwidth-delay paths like DSL where the sender is constantly over-running buffers, or high-bandwidth-delay WAN paths where the receiver isn't sending SACKs and the sender is wasting a lot of time recovering from losses. The shorter queues Vegas maintains should also help the performance of other flows going through the same bottlenecks. In particular, short flows (with small cwnds and thus vulnerable to costly timeouts from packet loss) should see better performance going over Vegas-dominated bottlenecks because they should suffer fewer packets losses.

Other Vegas Implementations

The x-kernel: The original TCP Vegas implementation was done in the x-kernel, a protocol framework from U. Arizona.
USC: Researchers at USC implemented TCP Vegas in NetBSD 1.0 and SunOS 4.1.3.
ns: The ns network simulator has an implementation of TCP Vegas.
The Linux 2.1.x Vegas Implementation: I'm not sure who wrote this implementation (if you know, please send me mail). But here it is (see tcp_cong_avoid_vegas() in here):
- linux-2.1.91-tcp_input.c
Our implementation is not related to this 2.1 implementation. Apparently the 2.1 implementation had performance problems; from looking at the code i'd guess that some of these problems resulted from coarse-grained (10ms on i386) time stamps and delayed ACKs.

Acknowledgments

Thanks to Lawrence Brakmo, Larry Peterson, Tom Anderson, Stefan Savage, Neil Spring, and Eric Hoffman for helping us sift through the subtleties of Vegas, and thanks to David Miller for lightning-fast responses to bug reports and questions while we were learning the ropes with Linux 2.1.x TCP.

Neal Cardwell