Quality of Service (QoS) Basics

In modern networks, not all data is created equal. A single video call dropping packets is immediately noticeable to users, while an email downloading a few seconds later often goes unnoticed. Quality of Service (QoS) is the set of tools and techniques used to manage network resources by prioritizing specific types of traffic over others. Without QoS, all traffic is treated as equal, leading to unpredictable performance for delay-sensitive applications like voice, video, and real-time data. Mastering QoS is fundamental for any network professional, as it transforms a best-effort network into a predictable business enabler.

Understanding the QoS Models: Best Effort, IntServ, and DiffServ

Before deploying any mechanisms, you must choose a QoS model—a framework that dictates how the network provides service guarantees. There are three primary models, each with distinct philosophies.

The Best-Effort model is the default state of most networks. It makes no promises and provides no guarantees. All packets are forwarded in the order they are received, whenever resources are available. This is simple but wholly inadequate for converged networks carrying multiple traffic types.

The Integrated Services (IntServ) model takes the opposite approach. It provides explicit, guaranteed service levels for specific application flows. Before sending data, an application must request and reserve the precise bandwidth and latency it needs from the network using a protocol like RSVP (Resource Reservation Protocol). While IntServ provides strong guarantees, it is not scalable for large networks like the Internet, as it requires routers to maintain state and resource information for every single flow.

The Differentiated Services (DiffServ) model is the scalable, modern compromise and the focus of most implementations, including those for the CCNA. Instead of managing individual flows, DiffServ categorizes traffic into a small number of classes. Each class receives a defined level of service. Routers then handle packets based on their assigned class, not their specific flow. This model is highly scalable because core network devices only need to look at a simple marking in the packet header to decide how to treat it, without maintaining per-flow state.

Classification and Marking: The Foundation of QoS

The first step in any DiffServ QoS deployment is classification, the process of identifying and categorizing traffic into different classes. Classification can be done using numerous criteria, such as source/destination IP address, protocol (e.g., TCP/UDP), or application port number (e.g., port 5060 for SIP). Once identified, traffic should be marked as close to its source as possible. Marking means setting a value in the packet header so that downstream devices can quickly identify its class without re-inspecting it deeply.

The two most common markings are CoS (Class of Service) and DSCP (Differentiated Services Code Point). CoS is a 3-bit field (values 0-7) within an Ethernet frame's 802.1Q header, used for prioritization on Layer 2 switched networks. DSCP is a 6-bit field (values 0-63) in the IP header's ToS (Type of Service) byte, used for prioritization on Layer 3 routed networks. A common mapping exists; for example, Voice traffic is often marked as CoS 5 and DSCP EF (Expedited Forwarding, decimal 46). Video Conferencing might be marked as CoS 4 and DSCP AF41 (Assured Forwarding, decimal 34). Marking at the trust boundary—the point where your controlled network begins—is critical to prevent end-users from marking their own traffic as high priority.

Queuing Algorithms: Managing the Outbound Line

When an interface is congested (more traffic wants to exit than the link can immediately handle), packets must be placed into queues. A queuing algorithm determines the order in which packets are taken from these queues and transmitted. First-In, First-Out (FIFO) queuing is simple but problematic, as a large burst of low-priority traffic can starve high-priority traffic.

Weighted Fair Queuing (WFQ) is an improvement that dynamically creates separate queues for different flows and services them in a balanced way, preventing any single flow from monopolizing bandwidth. It's fair but not necessarily intelligent about application needs.

For explicit priority, Low-Latency Queuing (LLQ) is the gold standard. LLQ combines a strict-priority queue with other defined classes using a mechanism like CBWFQ (Class-Based Weighted Fair Queuing). Traffic placed in the priority queue (like voice) is always serviced first, guaranteeing minimal delay and jitter. The bandwidth for the priority queue is strictly policed to prevent it from starving the other classes. The remaining traffic classes (like video, transactional data, and scavenger) are then serviced by CBWFQ based on their assigned bandwidth guarantees and weights.

Traffic Shaping and Policing: Controlling the Rate

Both shaping and policing are mechanisms to enforce a traffic rate, but they behave differently and are used in distinct scenarios. Their purpose is to ensure traffic conforms to a specified contract, like a Committed Information Rate (CIR).

Traffic policing enforces a rate by dropping or re-marking packets that exceed it. It is a hard limit. For example, if your contract with an ISP is for 50 Mbps, policing would drop any excess traffic above that rate. This can lead to TCP global synchronization, where many flows back off and restart at the same time, causing inefficient link utilization.

Traffic shaping, in contrast, buffers excess traffic and transmits it later, smoothing out bursts to conform to the desired rate. Shaping introduces delay but avoids drops, making it preferable at the edge of your network where you are transmitting toward a slower-speed WAN link (e.g., a 100 Mbps LAN connecting to a 50 Mbps WAN circuit). You would shape your outbound traffic to 50 Mbps to prevent a queue from forming—and packets from being dropped—on the ISP's policed interface.

Congestion Avoidance with WRED

Even with good queuing, if congestion is severe, queues can fill and tail-drop will occur, where all new packets are dropped regardless of their importance. Congestion avoidance techniques like WRED (Weighted Random Early Detection) try to prevent this by proactively dropping packets before the queue is full.

WRED works by monitoring the average queue depth. As the queue begins to fill, WRED starts randomly dropping packets from selected flows, causing TCP-based applications to slow their transmission rates early. The "Weighted" aspect means WRED can be configured to drop packets with lower DSCP values (e.g., best-effort traffic) more aggressively than packets with higher DSCP values (e.g., video). This intelligent dropping manages congestion more gracefully than a full-queue tail-drop scenario, which affects all traffic types equally and can cause significant network disruptions.

Common Pitfalls

Misconfiguring the Priority Queue in LLQ: The most common error with LLQ is not policing the priority queue. If you simply put all voice traffic in a priority queue without a bandwidth limit, a malfunctioning device could flood the network with traffic marked as voice, and LLQ would send it all first, completely starving every other class of service. Always configure a strict policer for the priority queue.
Marking at the Wrong Place (Trust Boundary): Trusting markings from end-user devices is a security and performance risk. A user could mark all their traffic as DSCP EF. The correct design is to classify and mark traffic at the ingress of the network (e.g., at the access switch port connecting an IP phone) or at the network edge. Internal network devices should then trust these markings and act upon them.
Over-Provisioning as a Substitute for QoS: While adding more bandwidth can mask problems, it is not a QoS strategy. QoS is about managing congestion, which will always occur at some point, especially on costly WAN links. QoS ensures that when congestion happens, business-critical applications are protected.
Ignoring QoS in Both Directions: A typical mistake is applying sophisticated queuing, shaping, and policing only on the outbound direction of a WAN link. Remember that traffic flows in two directions. Congestion can occur on the inbound path from the WAN to your LAN, which is often harder to control as you don't own the queuing mechanism on the ISP's router. Techniques like policy-based routing or working with your ISP may be required.

Summary

QoS models define the framework: use the scalable DiffServ model to classify traffic into groups that receive different levels of service, moving beyond simple Best-Effort delivery.
Classification and marking using DSCP (Layer 3) and CoS (Layer 2) values are the foundational steps, allowing downstream devices to quickly identify a packet's priority class.
Queuing algorithms manage congestion on outbound interfaces; Low-Latency Queuing (LLQ) is critical for providing strict priority to delay-sensitive traffic like voice while fairly servicing other classes.
Traffic shaping (buffering) and policing (dropping) control transmission rates, with shaping used to smooth traffic to a slower link and policing used to enforce hard limits.
Congestion avoidance with WRED proactively manages queue depth by selectively dropping packets to signal TCP flows to slow down before severe congestion and tail-drops occur.

Quality of Service (QoS) Basics

Quality of Service (QoS) Basics

Understanding the QoS Models: Best Effort, IntServ, and DiffServ

Classification and Marking: The Foundation of QoS

Queuing Algorithms: Managing the Outbound Line

Traffic Shaping and Policing: Controlling the Rate

Congestion Avoidance with WRED

Common Pitfalls

Summary

Write better notes with AI