Interval Trees

Imagine you're designing a calendar app where users can add meetings. When someone tries to schedule a new meeting from 2 PM to 4 PM, your system must instantly check for conflicts with existing appointments. A brute-force search through thousands of intervals is painfully slow. Interval trees solve this exact problem: they are specialized data structures that store intervals—like time slots or genomic ranges—and let you query which ones overlap with a given point or new interval with remarkable efficiency. By augmenting a classic binary search tree, they answer critical questions in computational geometry, resource scheduling, and database systems far faster than a simple list ever could.

What is an Interval Tree?

An interval tree is an augmented binary search tree (BST) specifically designed to store a set of intervals and support efficient overlap queries. At its core, each node in the tree stores one interval, traditionally represented by a low and high endpoint, like $[l o w, hi g h]$ . However, the true power comes from the augmentation: each node also stores the maximum endpoint found anywhere in its entire subtree. This single extra piece of information is the key to pruning entire branches of the tree during a search, leading to logarithmic performance.

Think of it like a library filing system. A normal BST might file books by a single title (a point). An interval tree files books by their entire span on the shelf (e.g., "starts at Dewey code 500 and ends at 599"). The librarian's note at each section (the max endpoint) reads "the furthest-ending book in this aisle is at 750." If you ask for all books covering code 400, the librarian can instantly ignore any aisle where the starting note is greater than 400 and the "max endpoint" note is less than 400.

Core Structure: The Augmented Node

The standard interval tree is built upon a balanced BST (often a red-black tree) to guarantee performance. The choice of the BST's key is crucial. Typically, intervals are sorted and inserted based on their starting or low endpoint.

Each node contains three essential fields:

interval: The actual interval data, e.g., $[22, 30]$ .
max: The maximum high endpoint value found in the interval stored in this node or in any node within its left and right subtrees.
The standard BST pointers: left and right.

The max value is maintained during insertions and deletions. When inserting a new interval, you traverse down the tree. After finding the insertion point and adding the node, you backtrack to the root, updating the max value of each ancestor node if the new interval's high endpoint is greater than the current max. This ensures the augmentation remains correct.

The Stabbing Query: Finding Intervals Containing a Point

The most classic operation is the stabbing query: given a query point $q$ , find all intervals in the tree that contain $q$ . This runs in $O (lo g n + k)$ time, where $n$ is the total number of intervals and $k$ is the number of intervals reported. The logarithmic search time comes from the tree height, and the $+ k$ accounts for the time needed to output all results.

The algorithm exploits the max field to decide which subtrees to explore:

Start at the root.
Check if the interval in the current node contains query point $q$ . If yes, report it.
Examine the left child. If the left child is not null and its max value is greater than or equal to $q$ , then there could be an interval in the left subtree that contains $q$ . Recurse into the left subtree.
Otherwise, recurse into the right subtree.

Why does this work? The left subtree contains intervals that start before the current node's interval (since we key on low endpoint). If the left child's max is less than $q$ , it means every interval in that entire left subtree ends before $q$ starts, so none can contain $q$ —we safely skip it. If the max is greater than or equal to $q$ , we must search there. We then always check the right subtree because those intervals start later, and some could still contain $q$ if they start before $q$ and end after it.

Example: Consider a node with interval $[17, 19]$ and max = 23 (from its subtree). For query point $q = 20$ :

Node interval $[17, 19]$ does not contain 20.
Check left child (max = 15). Since $15 < 20$ , skip the entire left subtree.
Recurse into the right subtree to continue the search.

Applications and Broader Query Types

Interval trees are fundamental in scenarios requiring fast overlap detection. Their ability to perform stabbing queries in $O (lo g n + k)$ time makes them ideal for:

Calendar and Scheduling Applications: Finding all events occurring at a specific time or conflicting with a new proposed meeting.
Computational Geometry: Solving interval intersection problems, such as determining which rendered lines or rectangles overlap a pixel coordinate in computer graphics.
Genomic Data Analysis: Querying which gene annotations (stored as intervals on a chromosome) cover a particular DNA base pair position.
Resource Allocation: Checking for availability of a resource (like a conference room) over a requested time period.

While the stabbing query is most common, interval trees can be extended to find all intervals that overlap a given query interval $[q_{l o w}, q_{hi g h}]$ , not just a single point. The logic is similar but checks for interval overlap instead of point containment. The same $O (lo g n + k)$ efficiency is maintained.

Common Pitfalls

Forgetting to Update the max Field: After insertion or deletion, failing to correctly propagate updates to the max values up the tree corrupts the data structure. Future queries will incorrectly prune subtrees, leading to missed results. Always recalculate max as max(current.high, left.max, right.max) during updates.
Misunderstanding the Query Logic: A common mistake is to only go left or only go right. The algorithm must often explore both branches. You go left if the left child's max suggests possible matches. You always check the right subtree when the query point is greater than the current node's low endpoint, as intervals starting later might still stretch back to contain the point.
Assuming Unbalanced Trees are Efficient: Building an interval tree on a standard, unbalanced BST can degenerate into a linked list in the worst case (e.g., inserting sorted intervals). This would make queries $O (n + k)$ instead of $O (lo g n + k)$ . Always base your implementation on a self-balancing BST like a red-black tree or AVL tree to guarantee logarithmic height.
Confusing with Segment Trees: Interval trees and segment trees both handle intervals but differ. Interval trees are designed for stabbing queries on a set of intervals, while segment trees are often used for answering aggregate queries (like sum or minimum) over a fixed, contiguous range that is known upfront. Choosing the wrong structure leads to clumsy and inefficient implementations.

Summary

An interval tree is an augmented binary search tree where each node stores an interval and the maximum endpoint in its subtree, enabling highly efficient overlap queries.
Its primary operation is the stabbing query, which finds all intervals containing a given point in $O (lo g n + k)$ time, where $n$ is the total intervals and $k$ is the number of results.
The query algorithm uses the stored max value to intelligently prune entire subtrees that cannot possibly contain the query point, avoiding a full tree traversal.
Maintaining the max augmentation correctly during insertions and deletions is critical for the structure's correctness.
Interval trees are directly applicable to real-world problems like calendar scheduling, resource management, and solving intersection problems in computational geometry.

Interval Trees

Interval Trees

What is an Interval Tree?

Core Structure: The Augmented Node

The Stabbing Query: Finding Intervals Containing a Point

Applications and Broader Query Types

Common Pitfalls

Summary

Write better notes with AI