Algo: Tarjan's Offline LCA Algorithm
AI-Generated Content
Algo: Tarjan's Offline LCA Algorithm
Finding the lowest common ancestor (LCA) of two nodes is a fundamental tree operation, crucial for solving problems in networks, genealogy, and compilers. When you have a tree and a large batch of LCA queries to answer, processing each one independently can be inefficient. Tarjan's offline LCA algorithm solves this elegantly by answering all queries in a single, coordinated depth-first search (DFS) using a Union-Find (Disjoint Set Union) data structure to manage visited subtrees. This approach yields an impressive amortized time complexity, making it the algorithm of choice for scenarios where all queries are known in advance.
The Core Idea: Offline Processing with Union-Find
At its heart, Tarjan's algorithm is an offline algorithm, meaning it requires all queries to be specified before execution begins. This allows it to organize work intelligently during a single DFS traversal of the tree. The central insight is that the LCA of two nodes and is the deepest node that is an ancestor of while is in its subtree, or vice-versa. The algorithm uses a Union-Find structure to group visited nodes into sets, where the representative of each set points to the oldest (highest) ancestor of all nodes in that set that has been fully processed.
As the DFS backtracks from a node, it unions that node's set with its parent's set. The representative of the new set becomes the parent. Crucially, whenever the DFS visits a node , it checks all queries involving . If the other node in a query has already been visited, the LCA is simply the current representative (or "leader") of the set containing . This works because 's set has been unioned upwards only as far as the deepest fully processed ancestor, which is precisely the LCA.
The Algorithm: A Step-by-Step Walkthrough
Let's walk through a concrete implementation. Assume we have a tree rooted at node 1 and a list of query pairs (u, v).
- Initialization: Create a Union-Find structure for nodes. For each node, store a list of its associated queries (e.g.,
queries[u].push_back(v)andqueries[v].push_back(u)). Also, create an arrayvisitedset to false and an arraylcato store answers. - Perform a DFS: Start a depth-first search from the root.
- Process Node: When the DFS enters a node , mark
visited[u] = true. Make its own set in the Union-Find structure (makeSet(u)). For every query partner of :
- If
visited[v] == true, then the answer for query(u, v)isfindSet(v)(the representative of 's set). Storelca[u][v] = findSet(v).
- Process Children: Recursively DFS all children of .
- Backtrack and Union: After processing all children, union the set of with the set of its parent (if is not the root). In a typical implementation, you call
unionSet(u, p), which merges the sets and makes the parent's representative the leader of the new combined set. - Completion: After the DFS completes, all queries have been answered.
Consider a tree: 1 (root) with children 2 and 3. Query: LCA(2, 3). DFS visits node 2, sees query partner 3 is not visited yet. DFS backtracks from 2 and unions set {2} with set {1}. DFS then visits node 3. It checks its query partner 2, which is visited. The findSet(2) now returns 1, because set {2} was unioned into set {1}. Therefore, LCA(2, 3) = 1.
Analyzing the Complexity: O(n α(n)) Amortized Per Query
The time complexity analysis is key to understanding the algorithm's efficiency. The DFS visits each of the nodes once. For each node, we perform two kinds of operations:
- Query Checks: We iterate over all queries associated with the node. If we store queries adjacency-style, each query
(u, v)is checked twice (once when is visited, once when is visited). For queries, this is work total. - Union-Find Operations: We perform one
makeSet, oneunionSet, and severalfindSetoperations. With path compression and union by rank, the amortized time per operation is , where is the extremely slow-growing inverse Ackermann function, effectively a constant (less than 5 for any conceivable input size).
Therefore, the total runtime is . When amortized over queries, the cost per query is , which is essentially constant. This makes it exceptionally efficient for batch processing.
Tarjan vs. Online LCA Methods
It's essential to contrast Tarjan's offline approach with popular online LCA algorithms like binary lifting. Binary lifting preprocesses the tree in time and then can answer each LCA query in time, without needing all queries upfront.
- When to use Tarjan's Algorithm: When all queries are known in advance (offline). Its amortized constant-time-per-query and relatively simple implementation (once Union-Find is understood) make it ideal for competitive programming problems or system pre-processing where the query set is fixed.
- When to use Binary Lifting: When queries arrive interactively or one-by-one (online). Its query time is fast and predictable, and it supports other useful operations like jumping levels up the tree. The preprocessing is also straightforward.
The choice hinges on the problem's constraints. Tarjan's is asymptotically superior for batched offline queries, while binary lifting offers greater flexibility for dynamic online scenarios.
Applications: From LCA to Tree Distance
The primary application of LCA is computing the distance between two nodes in a weighted tree. The distance can be derived using the formula:
where depth[x] is the sum of edge weights from the root to . After an DFS to compute node depths and an offline batch of LCA queries via Tarjan's algorithm, you can compute the distance for all query pairs in nearly constant time each. This pattern is powerful in network routing, phylogenetic tree analysis, and solving graph problems reduced to trees (e.g., on a minimum spanning tree).
Common Pitfalls
- Incorrect Union-Find Merging Order: A frequent mistake is unioning a node with its parent before processing all its queries. You must process all children and, crucially, answer all queries for the current node before performing the union on backtrack. Unioning too early makes the node's set representative point to an ancestor that is not the true LCA.
- Correction: Strictly follow the DFS post-order: Visit node, process its immediate queries, recursively process children, then union with parent.
- Mishandling Duplicate or Self-Queries: Not properly handling a query like
LCA(u, u)or the same query appearing multiple times in the list can cause errors or incorrect output formatting.
- Correction: During query setup, ensure you store query partners in a way that allows easy retrieval of the answer. For
LCA(u, u), the answer is trivially , and this can be handled as a special case during initialization.
- Assuming an Undirected Graph: The algorithm requires a rooted tree. Providing it with a general undirected graph without first defining a root and conducting a tree-forming DFS will lead to incorrect results or infinite recursion.
- Correction: Always preprocess your graph input to build a rooted tree (or forest) using a DFS, ignoring back edges if the graph is not a perfect tree.
Summary
- Tarjan's Offline LCA Algorithm answers a batch of LCA queries efficiently by performing a single DFS and using a Union-Find structure to track the deepest processed ancestor of visited nodes.
- The algorithm achieves an amortized time complexity of per query, effectively constant, after setup, making it optimal for offline batch processing.
- Its key differentiator from online methods like binary lifting is the requirement that all queries are known beforehand; for interactive queries, an online method is necessary.
- The primary use case is computing tree node distances using the formula .
- Successful implementation hinges on correctly ordering operations: answer queries for the current node before unioning it with its parent during DFS backtracking.