Algo: Centroid Decomposition of Trees

Centroid decomposition transforms a static tree into a balanced recursive structure, enabling efficient solutions to complex path and distance queries. Unlike brute-force methods that scale poorly, this technique allows you to process all $O (n^{2})$ possible paths in a tree in a manageable $O (n lo g n)$ time, making it indispensable for competitive programming and advanced algorithm design. Mastering it unlocks a powerful divide-and-conquer strategy for problems where paths are defined by a specific property, such as a sum, length, or node label constraint.

Understanding the Centroid

The first step is understanding the centroid of a tree. A centroid is a node whose removal splits the tree into connected components, each containing at most $n /2$ nodes, where $n$ is the number of nodes in the current subtree. A crucial property is that every tree has at least one centroid, and it can be found with a single Depth-First Search (DFS).

To find a centroid, you perform a DFS from an arbitrary root to compute subtree sizes. Then, for each node, you check if its largest "remaining component"—the part of the tree not in its subtree—and the sizes of all its child subtrees are all $\leq n /2$ . The first node satisfying this condition is a centroid. This process is linear, $O (n)$ , for the subtree being processed.

Example: Consider a star tree with one central node connected to many leaves. The central node is the centroid because removing it leaves components of size 1. In a path graph, the middle node (or one of the two middle nodes) is the centroid.

The Recursive Decomposition Process

Centroid decomposition is the recursive application of finding and removing centroids. The algorithm works as follows:

Find a centroid $C$ in the current tree.
Remove $C$ from the tree (conceptually, not physically), splitting it into several disconnected subtrees.
Recursively decompose each of these resulting subtrees.

The result is not a modified original tree, but a new, hierarchical tree known as the centroid decomposition tree (CDT). In the CDT, the centroid $C$ becomes the parent of the centroids found in the next recursive step within its child subtrees. This process guarantees that the height of the CDT is $O (lo g n)$ , because each recursion step at least halves the size of the subtree being processed. This logarithmic depth is the foundation of the algorithm's efficiency.

Analyzing Time Complexity: O(n log n) Preprocessing

The standard implementation has a preprocessing time complexity of $O (n lo g n)$ . Why? Finding a centroid for a subtree of size $k$ takes $O (k)$ time. In the recursive decomposition, each node in the original tree will be processed as part of a subtree at every level of the centroid tree where it remains.

Since the depth is $O (lo g n)$ , each node is processed $O (lo g n)$ times across all recursive calls. Summing this over all $n$ nodes gives the total $O (n lo g n)$ preprocessing cost. This cost typically covers building the centroid decomposition tree and pre-computing any necessary auxiliary data, such as distances from nodes to their centroid ancestors.

Solving Problems: Distance Queries and Path Counting

The power of centroid decomposition lies in solving path-based problems through a "solve at centroid, then recurse" pattern. The core idea is to enumerate paths by considering those that pass through the current centroid.

For Distance Queries (e.g., count pairs of nodes with distance exactly $k$ ):

At centroid $C$ , compute distances from $C$ to all nodes in its current component.
Use a data structure (like an array or map) to count how many nodes have a particular distance. For each subtree, you first count paths that might incorrectly pair nodes within the same subtree (which don't actually pass through $C$ ), subtract this overcount, and then update your main counter.
The answer is aggregated from all centroids. This avoids double-counting because every path is uniquely considered at the highest centroid it passes through in the CDT.

For Path Counting with Constraints (e.g., number of paths where the sum of node values equals a target): The process is analogous. At each centroid, you compute the sum from the centroid to other nodes. You then efficiently count complementary sums that would form the target when combined, again carefully subtracting the contribution from nodes within the same child subtree to ensure the path genuinely crosses the centroid.

This divide-and-conquer approach reduces a potentially $O (n^{2})$ enumeration of all paths to a more manageable process where, at each of the $O (lo g n)$ levels, you perform a linear or log-linear scan of the nodes in the component.

Common Pitfalls

Forgetting to Decompose the Entire Tree: A common implementation error is to recursively decompose only the subtrees formed by the centroid's children. You must decompose the entire forest created after removing the centroid, which includes the component containing the centroid's parent (the "upper" part of the tree). Failing to do this will leave parts of the tree unprocessed.

Incorrectly Handling "Same Subtree" Subtraction: When counting paths through a centroid, you must prevent pairing nodes that lie in the same child subtree, as the path between them does not actually pass through the centroid. The correct pattern is: for each child subtree, compute contributions, subtract them from your running total to correct for this overcount, and then add the subtree's nodes to your main data structure. Doing these steps out of order leads to wrong answers.

Assuming O(n log n) is Always Optimal: While $O (n lo g n)$ is excellent for many problems, it is not the only tool. For specific queries like shortest distance between two arbitrary nodes, Lowest Common Ancestor (LCA) with preprocessing yields $O (1)$ queries. Centroid decomposition shines for problems requiring you to consider all paths or aggregate information based on a path property.

Inefficient Data Structure Reset: At each centroid, you often need a fresh data structure (e.g., a frequency array). Resetting it by clearing all entries can take $O (n)$ time, leading to an $O (n^{2})$ blow-up. Instead, keep a list of modified indices/keys and only reset those, or use a dictionary with a versioning technique.

Summary

Centroid decomposition recursively partitions a tree by its centroid, building a balanced centroid decomposition tree (CDT) of $O (lo g n)$ depth.
The $O (n lo g n)$ preprocessing complexity arises because each node is processed once per level in the CDT.
It enables efficient solutions to path problems by enumerating all paths at the centroid they pass through highest in the CDT, using a divide-and-conquer strategy.
Key applications include solving distance queries and path counting problems with constraints, effectively reducing an $O (n^{2})$ path enumeration to an $O (n lo g n)$ or $O (n lo g^{2} n)$ algorithm.
Careful implementation is required to correctly handle subtree subtraction and reset auxiliary data structures to maintain efficiency.

Algo: Centroid Decomposition of Trees

Algo: Centroid Decomposition of Trees

Understanding the Centroid

The Recursive Decomposition Process

Analyzing Time Complexity: O(n log n) Preprocessing

Solving Problems: Distance Queries and Path Counting

Common Pitfalls

Summary

Write better notes with AI