Skip to content
Feb 25

Algo: Heavy-Light Decomposition

MT
Mindli Team

AI-Generated Content

Algo: Heavy-Light Decomposition

Heavy-light decomposition transforms cumbersome tree path operations into efficient sequences of array queries, bridging graph theory with powerful data structures. If you've ever struggled to answer queries like "what is the sum along this path?" faster than linear time, this algorithm is your solution. It is a cornerstone technique in competitive programming, software engineering for network analysis, and any domain where hierarchical data requires rapid updates and retrievals.

The Path Query Problem and the Need for Decomposition

Consider a weighted tree where you need to repeatedly find the sum, maximum, or any aggregate value along the path between two nodes. A naive approach traverses the path for each query, resulting in time per operation, which is prohibitively slow for large trees and many queries. The core challenge is that trees lack the inherent linear order of arrays, which allow for fast range queries via structures like segment trees. Heavy-light decomposition (HLD) solves this by cleverly restructuring the tree into a collection of linear paths, or chains, upon which efficient data structures can be applied. The fundamental guarantee is that any path from the root to a leaf will intersect only different chains, leading to efficient query processing.

Defining Heavy and Light Edges

The decomposition begins by rooting the tree at an arbitrary node, which defines parent-child relationships. For each node, you examine its children. The heavy edge is the edge connecting the node to the child whose subtree has the largest number of nodes (its size is maximal). All other edges from the node to its children are classified as light edges. If two children's subtrees are of equal size, you may choose one arbitrarily as heavy—the algorithm's correctness does not depend on this tie-breaking.

This simple classification has a powerful consequence: any path from the root to a leaf will traverse at most light edges. Why? Because every time you ascend from a child to its parent via a light edge, the size of the subtree you are leaving at least doubles. Since subtree sizes are bounded by , you can only double times. Heavy edges, in contrast, form long chains that can be traversed efficiently in bulk. This property is the engine of HLD's logarithmic performance.

Implementing the Decomposition Algorithm

Implementing HLD requires two depth-first search (DFS) passes over the tree. The first DFS computes the size of each node's subtree and identifies the heavy child for every node. You store values like subtree_size[u] and heavy_child[u].

The second DFS builds the chains. It traverses the tree, always visiting the heavy child first, which ensures all nodes connected by heavy edges end up consecutively in the same chain. Each chain is assigned a unique ID, and every node records its chain_id and its position within that chain's array, often called position_in_chain. This process flattens each chain into a linear array. Crucially, the DFS order guarantees that the positions in a chain's array are contiguous, making them perfect for indexing into a segment tree or similar structure.

Here is a conceptual step-by-step outline:

  1. Run DFS to compute subtree sizes and identify heavy children.
  2. Initialize a second DFS that starts at the root.
  3. For each node, if it has a heavy child, extend the current chain by adding that child. Start a new chain when encountering a light edge to a child.
  4. Assign array indices to nodes as they are added to chains.

Combining HLD with Segment Trees for Queries

Once the tree is decomposed into chains, each chain is mapped onto a segment tree (or a Fenwick tree). The contiguous positions within a chain become indices in the segment tree's underlying array. This allows you to perform range queries (like sum or maximum) or range updates on an entire chain in time.

The magic happens when answering a query between two arbitrary nodes, u and v. The process is analogous to moving two pointers up the tree towards their lowest common ancestor (LCA):

  1. While u and v are in different chains, you lift the node that is deeper in its chain (farther from the chain's head, which is typically the node closest to the root) up to the head of its chain. For each lift, you query or update the segment tree over the range representing that segment of the chain.
  2. Once u and v are in the same chain, they are on a linear path. You then perform one final query on the segment tree for the range between their positions in that chain.

Because each lift operation moves from a chain to its parent via a light edge, and there are only light edges on any path, the total number of segment tree operations is . Each segment tree operation is , yielding an overall time complexity of per path query or update.

Applications: Path Sum, Maximum, and Updates

The true power of HLD is realized in solving concrete problems. For path sum queries, you store node values (or edge weights, by associating them with the child node) in the segment tree configured for range-sum queries. The query process described above aggregates values along the path. For path maximum queries, you configure the segment tree for range-maximum queries instead.

Path updates work identically. Suppose you need to add a value to every node on the path from u to v. You follow the same lifting procedure, but instead of querying the segment tree, you issue a range update (e.g., lazy propagation) for each chain segment you traverse. This also runs in time. A common scenario is maintaining dynamic trees in network routing or game object hierarchies, where attributes along connection paths need frequent modification.

Common Pitfalls

  1. Incorrect Subtree Size Calculation: A frequent error is to compute subtree sizes without a proper DFS post-order traversal. You must process all children before a node to know their sizes. Correction: Ensure your first DFS recursively calls for children first, then computes subtree_size[u] as 1 (for itself) plus the sum of the children's subtree sizes.
  1. Mishandling the LCA during Queries: When performing a path query between u and v, it's easy to incorrectly determine which node to lift. You must always lift the node that is deeper within its current chain (i.e., has a higher position_in_chain value relative to its chain's head). Correction: Compare the depth of the chain heads or the nodes' positions directly. The rule is: while the chains differ, if the head of u's chain is deeper than the head of v's chain, lift u; otherwise, lift v.
  1. Forgetting to Map Edge Weights to Nodes: Many problems involve values on edges (like network bandwidth), but segment trees manage values on array indices corresponding to nodes. Correction: Standard practice is to assign the weight of an edge to the child node of that edge during preprocessing. Then, queries between u and v must exclude the LCA's value, as it represents the edge above the LCA, which is not part of the path.
  1. Inefficient Chain Jump Logic: Implementing the query loop with verbose or repeated chain head lookups can clutter code and introduce bugs. Correction: Write a helper function query_chain(u, v) that performs the segment tree operation on the chain segment from u to v (assuming they are in the same chain). Keep the main query loop clean, focusing only on moving u and v until they align.

Summary

  • Heavy-light decomposition restructures a rooted tree into a set of vertex-disjoint paths (chains) by labeling each edge as heavy (to the child with the largest subtree) or light, guaranteeing light edges on any root-to-leaf path.
  • By mapping each chain contiguously onto a segment tree, path queries (like sum or maximum) and path updates can be performed in time by sequentially processing chain segments.
  • Implementation requires two DFS passes: one to compute subtree sizes and identify heavy children, and another to assign nodes to chains and their positions.
  • The core query algorithm repeatedly "lifts" the deeper of two nodes in their respective chains to the chain head, performing segment tree operations on each lifted segment until both nodes are in the same chain for a final query.
  • This technique is essential for solving complex dynamic problems on tree paths efficiently, forming a foundational tool for algorithm designers and software engineers working with hierarchical data.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.