DS: Cartesian Trees

A Cartesian tree is a unique binary tree data structure built from an array of numbers that cleverly encodes two different ordering relationships simultaneously. While many data structures excel at either searching by key or querying minimum values, Cartesian trees blend these capabilities, making them powerful tools for efficiently solving problems like range minimum queries. Understanding their construction and properties unlocks a deeper grasp of algorithmic design, connecting fundamental concepts like binary search trees and heaps to advanced applications in competitive programming and software engineering.

Formal Definition and Properties

A Cartesian tree is built from an array of comparable values (typically numbers). For any given array, there is exactly one Cartesian tree, making it a canonical representation. It is defined by two invariant rules that govern every node's relationship with its children.

First, the tree obeys binary search tree (BST) ordering with respect to the indices of the array. This means for any node representing array element A[i], all nodes in its left subtree must correspond to indices less than i, and all nodes in its right subtree must correspond to indices greater than i. The tree is thus an inorder traversal of the original array sequence.

Second, the tree obeys heap ordering with respect to the values of the array. This can be either a min-heap or a max-heap. In a more common min-heap Cartesian tree, for any node with value A[i], its children must have values greater than or equal to A[i]. Conversely, in a max-heap variant, parent values are greater than or equal to child values. The root of a min-heap Cartesian tree always holds the minimum value from the entire array.

These dual constraints—BST order on indices and heap order on values—uniquely determine the tree's structure. Consider the array [9, 3, 7, 1, 8, 12, 10, 20, 15, 18, 5]. Its min-heap Cartesian tree would have the global minimum 1 at the root. All elements before 1 (indices 0-2, values [9, 3, 7]) are in the left subtree, and all elements after it are in the right subtree, with the heap property maintained recursively.

Linear-Time Construction Using a Stack

A naive construction method would require $O (n^{2})$ time in the worst case. However, Cartesian trees can be built in $O (n)$ time using a monotonic stack. This algorithm processes the array from left to right, using a stack to maintain the right spine of the partial tree built so far—that is, the path from the root to the rightmost node.

The algorithm proceeds as follows for a min-heap Cartesian tree:

Initialize an empty stack.
For each new element A[i] in the array:

a. Create a new node for A[i]. b. While the stack is not empty and the value at the stack's top node is greater than A[i], pop nodes from the stack. c. After popping, the last popped node (if any) becomes the left child of the new node A[i]. d. If the stack is not empty after popping, the current top node's right child becomes the new node A[i]. e. Push the new node A[i] onto the stack.

Let's walk through the first few steps with the example array [3, 7, 1].

Process 3: Stack is empty. Push node(3). Stack: [3].
Process 7: 3 < 7, so no pop. Set node(7) as the right child of node(3). Push node(7). Stack: [3, 7]. The right spine is 3->7.
Process 1: 7 > 1, so pop node(7). 3 > 1, so pop node(3). Stack is now empty. Set the last popped node (node(7)) as the left child of node(1). Push node(1). Stack: [1]. The new root is node(1) with a left subtree containing nodes 3 and 7 (which maintain the BST-on-index property).

This process works because the stack maintains nodes in increasing order of value (min-heap property) along the rightmost path. By always attaching nodes to the right of smaller ancestors or as the left child of a larger ancestor, we preserve both the heap and BST invariants.

Connection to Treaps

The Cartesian tree is a deterministic cousin of the Treap (Tree + Heap). A Treap is a randomized binary search tree where each node is assigned a random priority in addition to its key. The Treap structure is then defined by the rule that it must be a BST with respect to the keys and a (max-)heap with respect to the priorities.

If you take an array and treat the array indices as the BST keys and the array values as the heap priorities, the resulting Treap is exactly the Cartesian tree for that array. This perspective is incredibly useful: it means any algorithm or property related to Treaps can often be applied to Cartesian trees, and vice versa. For instance, the expected depth of a node in a randomly prioritized Treap is $O (lo g n)$ , which informs us about the behavior of Cartesian trees built from randomly ordered data.

Application to Range Minimum Queries (RMQ)

One of the most powerful applications of Cartesian trees is in solving the Range Minimum Query (RMQ) problem. Given a static array, we want to repeatedly ask: "What is the index of the minimum element between indices i and j?" A Cartesian tree provides an elegant solution by reducing RMQ to a Lowest Common Ancestor (LCA) problem.

Here’s the reduction: First, build the min-heap Cartesian tree for the array. The key theorem states that for any query range [i, j], the minimum element in that range corresponds to the LCA of the nodes representing A[i] and A[j] in the Cartesian tree.

Why does this work? Due to the BST-on-index property, the LCA of nodes i and j is a node k whose index lies between i and j. Because of the min-heap property, the value at node k is smaller than all values in both subtrees, meaning it is the smallest value on the path between i and j in the tree—which, due to the BST indexing, corresponds to the smallest value in the array range [i, j].

This reduction is powerful because LCA on a static tree can be answered in $O (1)$ time per query after an $O (n)$ time preprocessing step using techniques like Euler Tour traversal combined with a Sparse Table. Therefore, by building a Cartesian tree in $O (n)$ time and preprocessing it for LCA, we create an optimal RMQ data structure that answers queries in constant time.

Common Pitfalls

Confusing the Ordering Constraints: A frequent conceptual error is mixing up which property applies to indices versus values. Remember: BST order is on array indices, and heap order is on array values. Trying to enforce a BST on values will not yield a correct Cartesian tree. Always verify that an inorder traversal of the tree yields the original array sequence (by index).

Incorrect Stack Algorithm Implementation: When implementing the $O (n)$ construction, a common mistake is incorrectly managing child pointers after popping from the stack. The rule is: the last node popped becomes the left child of the new node. The new node then becomes the right child of the node now at the top of the stack. Failing to link these correctly breaks the BST indexing property.

Assuming Balanced Trees: A Cartesian tree built from an arbitrary array is not guaranteed to be balanced. Its shape is determined by the input. For example, an array sorted in decreasing order produces a min-heap Cartesian tree that degenerates into a left-skewed chain. While this doesn't affect the correctness of the RMQ reduction, it's important for understanding time complexities in other potential uses.

Misapplying the RMQ Reduction: The RMQ-to-LCA reduction works seamlessly for min-heap Cartesian trees. If you mistakenly build a max-heap Cartesian tree, the LCA will correspond to the maximum element in the range, not the minimum. Always ensure the heap type (min/max) matches your query goal.

Summary

A Cartesian tree is a unique binary tree derived from an array that combines binary search tree ordering on the original indices with heap ordering (min or max) on the array values.
It can be constructed in optimal $O (n)$ time using a monotonic stack algorithm that maintains the tree's right spine, cleverly establishing parent-child relationships in a single pass.
Cartesian trees are directly related to Treaps, where the array index acts as the key and the array value acts as the random priority, providing a bridge between deterministic and randomized data structure analysis.
Their most celebrated application is providing an elegant reduction from the Range Minimum Query (RMQ) problem to the Lowest Common Ancestor (LCA) problem on trees, enabling $O (n)$ preprocessing and $O (1)$ query time for static RMQ.
When working with Cartesian trees, carefully distinguish the two ordering constraints and be mindful that the tree's shape is input-dependent, which is acceptable for its core applications like RMQ preprocessing.

DS: Cartesian Trees

DS: Cartesian Trees

Formal Definition and Properties

Linear-Time Construction Using a Stack

Connection to Treaps

Application to Range Minimum Queries (RMQ)

Common Pitfalls

Summary

Write better notes with AI