Skip Lists

You need to search through a sorted list of items—like user IDs, timestamps, or dictionary words—as quickly as possible. A simple sorted linked list gives you linear, $O (n)$ , search time, which is too slow for large datasets. Balanced binary search trees like AVL or Red-Black trees offer guaranteed $O (lo g n)$ time but are complex to implement correctly. Skip lists bridge this gap: they are a probabilistic data structure that provides expected $O (lo g n)$ time complexity for search, insertion, and deletion using a clever, multi-layered system of linked lists. They offer a simpler, elegant alternative to balanced trees with comparable performance, which is why they power core components of systems like the Redis database.

Core Idea and Structure

At its heart, a skip list is a probabilistic multi-level linked list. Imagine a standard sorted linked list as the ground floor. A skip list adds "express lanes" above it. The bottom level (Level 0) contains all the elements in sorted order. The level above it (Level 1) contains a random subset of the elements from Level 0. Level 2 contains a random subset of the elements from Level 1, and so on. This creates a tower of lists, each becoming sparser as you move up.

The key to navigating this structure is the node. Each node contains:

A value (or key).
A tower of forward pointers, one for each level the node participates in.
A height, which is randomly determined when the node is created.

The highest level with at least one node defines the skip list's current height. A common practice is to start with a header node that has forward pointers up to the maximum possible height, acting as the entry point for all searches. The probability of a node rising to the next level is typically 50%, leading to an exponentially decaying distribution of nodes across levels. This randomness is what makes it probabilistic, as opposed to the strictly deterministic rules of a balanced tree.

How Search Works: Traversing the Express Lanes

Searching for a target value leverages the multi-level structure to skip over large portions of the list. The algorithm starts at the highest level of the header node and moves right along that level as long as the next node's value is less than the target. When the next node's value would be greater than or equal to the target, the search "drops down" one level and continues the process. This continues until you reach Level 0.

For example, to find the value 47 in a skip list:

Start at the header's top level (e.g., Level 3). Move right until you would pass 47, then drop to Level 2.
At Level 2, move right until you would pass 47, then drop to Level 1.
At Level 1, move right until the next node is 47 or greater. If you find 47, the search is successful. If not, drop to Level 0.
At Level 0, move right one node at a time to check for 47.

This process is like looking for a city on a highway map: you first take the interstate (high level) to get to the general region, then exit to a state highway (middle level), and finally use local roads (Level 0) to find the exact street. The expected number of steps required is proportional to $lo g n$ , making it exponentially faster than a linear scan.

Insertion and Deletion: Maintaining the Probabilistic Balance

Insertion begins with a search to find the insertion point—specifically, it records the node at each level that would precede the new node. Then, it determines the new node's random height. A standard method is to "flip a coin": start at height 1, and repeatedly increment the height while a random function returns heads (e.g., with probability $p = 1/2$ ). The maximum height is often capped. Once the height $k$ is known, a new node of that height is created. Finally, the new node's forward pointers at levels 0 through $k - 1$ are linked to the corresponding successor nodes found during the search, and the predecessors' pointers at those levels are updated to point to the new node. No global rebalancing is needed.

Deletion follows a similar pattern. A search is performed to locate the node to be deleted and to record its predecessors at every level. The forward pointers of these predecessors are then updated to "skip over" the doomed node, linking directly to its successors at each relevant level. The node is then removed from memory. Like insertion, this is a local operation with expected $O (lo g n)$ time.

Comparison with Balanced Search Trees

Skip lists are frequently compared to balanced binary search trees (BSTs) like AVL or Red-Black trees, as both aim for $O (lo g n)$ operations on sorted data.

Simplicity: The algorithms for search, insertion, and deletion in a skip list are conceptually simpler and often require less code. Implementing a correct, self-balancing tree involves managing numerous rotation cases and intricate pointer manipulations.
Performance: In practice, skip lists can match or even exceed the performance of balanced trees for many workloads. Their cache performance can be favorable due to the sequential nature of list traversals versus pointer-chasing in trees.
Concurrency: Skip lists have a significant advantage in concurrent (multi-threaded) environments. Locks can be applied locally to a small set of nodes during insertion/deletion, allowing a high degree of parallelism. Locking entire subtrees for rebalancing, as often required in trees, is a major bottleneck.
Guarantees vs. Expectations: Balanced trees provide strict $O (lo g n)$ worst-case guarantees. Skip lists provide expected $O (lo g n)$ performance, with a very high probability of good performance for large n. The probabilistic nature means there is a tiny, non-zero chance of a poorly balanced structure, but this is negligible in practice.

This combination of simplicity and high performance is why skip lists are used in real-world systems. A prominent example is Redis, an in-memory data structure store, which uses skip lists to implement its sorted set data type.

Common Pitfalls

Misunderstanding "Probabilistic": A common mistake is to think skip lists are unpredictable or slow in the worst case. While the worst-case scenario is theoretically $O (n)$ (if every node had the same height, reducing it to a single linked list), the probability of this occurring is astronomically low for any reasonable dataset and standard $p = 1/2$ probability. The expected performance is the crucial and reliable metric.
Incorrect Random Height Generation: Implementing the random height generation incorrectly can break the logarithmic time property. The height must be generated using a geometric distribution (e.g., repeated coin flips). Simply using a uniform random number up to a fixed maximum will not create the necessary exponential distribution of nodes across levels, degrading performance.
Forgetting to Update All Level Pointers: During insertion and deletion, it's critical to locate and update the predecessor nodes at every level from 0 up to the height of the inserted/deleted node. Missing an update at a higher level will leave "dangling" pointers that break the list's integrity for future searches.
Over-Optimizing for Determinism: Trying to eliminate randomness to make the structure deterministic often leads to re-creating the complexity of a balanced tree. The elegance and simplicity of skip lists are intrinsically tied to their probabilistic design.

Summary

Skip lists are probabilistic data structures that use multiple levels of linked lists to enable fast search in sorted data, with an expected time complexity of $O (lo g n)$ for search, insertion, and deletion.
Operations work by starting at a high, sparse level to skip over many items, then "dropping down" to lower levels to refine the search—much like using express lanes on a highway.
They are significantly simpler to implement than self-balancing binary search trees like AVL or Red-Black trees while offering comparable practical performance.
Their structure is easier to make thread-safe for concurrent access, giving them an advantage in multi-threaded applications.
Their reliability and efficiency have led to adoption in major software systems, most notably within the Redis database for implementing its sorted set data type.

Skip Lists

Skip Lists

Core Idea and Structure

How Search Works: Traversing the Express Lanes

Insertion and Deletion: Maintaining the Probabilistic Balance

Comparison with Balanced Search Trees

Common Pitfalls

Summary

Write better notes with AI