Transport Layer: TCP Congestion Control

The Internet is a shared resource, and without traffic regulation, it would rapidly descend into gridlock. TCP congestion control is the silent, algorithmic guardian that prevents this collapse. It dynamically adjusts a sender's transmission rate based on inferred network conditions, ensuring fair bandwidth sharing and stable operation for everyone. Mastering its mechanisms is essential for understanding how reliable data transfer scales to a global network.

The Core Algorithm: Slow Start and Congestion Avoidance

TCP congestion control is fundamentally governed by the congestion window (cwnd), a sender-side limit on the amount of unacknowledged data that can be in flight. The core challenge is to increase cwnd to utilize available bandwidth without overfilling router queues, which causes packet loss.

The algorithm begins with Slow Start. Upon establishing a new connection or after a severe timeout, cwnd is initialized to a small value (e.g., 10 segments). For every acknowledgment (ACK) received, cwnd increases by one segment. This leads to exponential growth: one segment sent, one ACK returns (cwnd=2); two segments sent, two ACKs return (cwnd=4), and so on. This rapid probing phase continues until cwnd reaches a threshold called the slow start threshold (ssthresh).

Once cwnd reaches or surpasses ssthresh, TCP transitions to the Congestion Avoidance phase. The goal here is cautious, linear growth to avoid inducing loss. Instead of increasing cwnd per ACK, the window is increased by roughly one segment per round-trip time (RTT). A common implementation is to increment cwnd by 1/cwnd for each incoming ACK. This Additive Increase policy allows TCP to gradually probe for additional available bandwidth.

Detecting and Reacting to Congestion

TCP infers congestion primarily through packet loss, signaled in two ways: a retransmission timeout (RTO) or the receipt of duplicate ACKs.

A retransmission timeout is the most severe signal, implying the network path may be severely congested or a packet is completely lost. The reaction is drastic: ssthresh is set to half the current cwnd, cwnd is reset to its initial value, and Slow Start is re-initiated. This resets the probing process from a conservative base.

A fast retransmit mechanism provides a quicker, less severe response. If a sender receives three duplicate ACKs for the same segment, it strongly infers that a single packet was lost (not a complete path failure). Instead of waiting for a timeout, the sender immediately retransmits the missing segment. This triggers the fast recovery phase.

In fast recovery, the sender sets ssthresh to half the current cwnd, but instead of dropping to a tiny window, it sets cwnd to ssthresh plus a small fudge factor (often 3 segments, accounting for the three duplicate ACKs that triggered the event). It then enters Congestion Avoidance directly, continuing to grow the window linearly from this recovered point. This allows throughput to stabilize more quickly after a single loss event.

Evolution of TCP Variants: Tahoe, Reno, and Beyond

The specific handling of fast retransmit and recovery defines classic TCP variants. TCP Tahoe includes fast retransmit but does not implement fast recovery. Upon three duplicate ACKs, Tahoe sets cwnd to 1 and re-enters Slow Start, which is overly conservative for a single loss.

TCP Reno improves on this by adding the fast recovery algorithm described above. After fast retransmit, it enters fast recovery, avoiding the full window collapse for a single loss. However, Reno struggles with multiple packet losses within a single window; if not enough duplicate ACKs return to trigger recovery, it may still fall back to a costly timeout.

Modern networks with high bandwidth-delay products led to the development of more aggressive algorithms. TCP CUBIC is the default in Linux and many other systems. It replaces the linear increase of Congestion Avoidance with a cubic function of time since the last congestion event. After a loss, cwnd drops sharply, then grows rapidly before plateauing as it approaches the previous saturation point, and then grows slowly to probe beyond it. This allows for faster utilization of high-speed links while remaining stable. Its window growth is governed by a function of time: $W (t) = C (t - K)^{3} + W_{ma x}$ , where $C$ is a scaling constant, $t$ is time since last reduction, $K$ is the time to reach $W_{ma x}$ again, and $W_{ma x}$ is the window size at the last congestion event.

Common Pitfalls

Confusing Flow Control with Congestion Control. Flow control uses the receiver window (rwnd) to prevent overwhelming the receiver's buffer. Congestion control uses the congestion window (cwnd) to prevent overwhelming the network. The actual sending window is the minimum of these two: $min (c w n d, r w n d)$ .
Misunderstanding the "AIMD" Principle. The core of Congestion Avoidance is Additive Increase, Multiplicative Decrease (AIMD). On success (ACKs received), cwnd grows additively (e.g., +1 per RTT). On congestion (loss detected), cwnd is cut multiplicatively (e.g., halved). This produces the classic "sawtooth" pattern of throughput and is key to TCP's fairness and stability.
Assuming Fast Retransmit Always Works. Fast retransmit relies on receiving three duplicate ACKs, which requires at least some packets to be delivered after the lost packet. If an entire window of packets is lost (e.g., in a severe congestion collapse), no duplicate ACKs are generated, forcing a long retransmission timeout and a full Slow Start restart, severely impacting performance.
Overlooking the Impact of Bufferbloat. Modern routers often have very large buffers. This can cause high latency as buffers fill, but TCP may not see packet loss until these huge buffers are completely full. This delays the congestion signal, leading to sluggish performance and high delays—a problem TCP's basic loss-based detection doesn't handle well.

Summary

TCP congestion control is a feedback loop where the sender infers network capacity by monitoring for packet loss, primarily adjusting its congestion window (cwnd).
The core cycle involves exponential Slow Start to initially probe bandwidth, followed by linear Congestion Avoidance (AIMD) to gently search for more, with multiplicative backoffs on loss.
Fast retransmit (on 3 duplicate ACKs) and fast recovery allow quick recovery from a single packet loss without a severe performance penalty, distinguishing variants like TCP Reno from the older TCP Tahoe.
Modern high-speed networks use advanced algorithms like TCP CUBIC, which uses a cubic growth function to be more aggressive and efficient in high bandwidth-delay product environments while maintaining fairness.
The ultimate goal is not to avoid loss entirely, but to use loss as a signal to operate near the network's capacity, ensuring stable and fair sharing of bandwidth among all competing TCP flows.

Transport Layer: TCP Congestion Control

Transport Layer: TCP Congestion Control

The Core Algorithm: Slow Start and Congestion Avoidance

Detecting and Reacting to Congestion

Evolution of TCP Variants: Tahoe, Reno, and Beyond

Common Pitfalls

Summary

Write better notes with AI