Algo: Tarjan's Offline LCA Algorithm

Finding the lowest common ancestor (LCA) of two nodes is a fundamental tree operation, crucial for solving problems in networks, genealogy, and compilers. When you have a tree and a large batch of LCA queries to answer, processing each one independently can be inefficient. Tarjan's offline LCA algorithm solves this elegantly by answering all queries in a single, coordinated depth-first search (DFS) using a Union-Find (Disjoint Set Union) data structure to manage visited subtrees. This approach yields an impressive amortized time complexity, making it the algorithm of choice for scenarios where all queries are known in advance.

The Core Idea: Offline Processing with Union-Find

At its heart, Tarjan's algorithm is an offline algorithm, meaning it requires all queries to be specified before execution begins. This allows it to organize work intelligently during a single DFS traversal of the tree. The central insight is that the LCA of two nodes $u$ and $v$ is the deepest node that is an ancestor of $u$ while $v$ is in its subtree, or vice-versa. The algorithm uses a Union-Find structure to group visited nodes into sets, where the representative of each set points to the oldest (highest) ancestor of all nodes in that set that has been fully processed.

As the DFS backtracks from a node, it unions that node's set with its parent's set. The representative of the new set becomes the parent. Crucially, whenever the DFS visits a node $u$ , it checks all queries involving $u$ . If the other node $v$ in a query has already been visited, the LCA is simply the current representative (or "leader") of the set containing $v$ . This works because $v$ 's set has been unioned upwards only as far as the deepest fully processed ancestor, which is precisely the LCA.

The Algorithm: A Step-by-Step Walkthrough

Let's walk through a concrete implementation. Assume we have a tree rooted at node 1 and a list of query pairs (u, v).

Initialization: Create a Union-Find structure for $n$ nodes. For each node, store a list of its associated queries (e.g., queries[u].push_back(v) and queries[v].push_back(u)). Also, create an array visited set to false and an array lca to store answers.
Perform a DFS: Start a depth-first search from the root.
Process Node: When the DFS enters a node $u$ , mark visited[u] = true. Make $u$ its own set in the Union-Find structure (makeSet(u)). For every query partner $v$ of $u$ :

If visited[v] == true, then the answer for query (u, v) is findSet(v) (the representative of $v$ 's set). Store lca[u][v] = findSet(v).

Process Children: Recursively DFS all children of $u$ .
Backtrack and Union: After processing all children, union the set of $u$ with the set of its parent $p$ (if $u$ is not the root). In a typical implementation, you call unionSet(u, p), which merges the sets and makes the parent's representative the leader of the new combined set.
Completion: After the DFS completes, all queries have been answered.

Consider a tree: 1 (root) with children 2 and 3. Query: LCA(2, 3). DFS visits node 2, sees query partner 3 is not visited yet. DFS backtracks from 2 and unions set {2} with set {1}. DFS then visits node 3. It checks its query partner 2, which is visited. The findSet(2) now returns 1, because set {2} was unioned into set {1}. Therefore, LCA(2, 3) = 1.

Analyzing the Complexity: O(n α(n)) Amortized Per Query

The time complexity analysis is key to understanding the algorithm's efficiency. The DFS visits each of the $n$ nodes once. For each node, we perform two kinds of operations:

Query Checks: We iterate over all queries associated with the node. If we store queries adjacency-style, each query (u, v) is checked twice (once when $u$ is visited, once when $v$ is visited). For $Q$ queries, this is $O (Q)$ work total.
Union-Find Operations: We perform one makeSet, one unionSet, and several findSet operations. With path compression and union by rank, the amortized time per operation is $O (α (n))$ , where $α (n)$ is the extremely slow-growing inverse Ackermann function, effectively a constant (less than 5 for any conceivable input size).

Therefore, the total runtime is $O (n α (n) + Q)$ . When amortized over $Q$ queries, the cost per query is $O (α (n))$ , which is essentially constant. This makes it exceptionally efficient for batch processing.

Tarjan vs. Online LCA Methods

It's essential to contrast Tarjan's offline approach with popular online LCA algorithms like binary lifting. Binary lifting preprocesses the tree in $O (n lo g n)$ time and then can answer each LCA query in $O (lo g n)$ time, without needing all queries upfront.

When to use Tarjan's Algorithm: When all queries are known in advance (offline). Its amortized constant-time-per-query and relatively simple implementation (once Union-Find is understood) make it ideal for competitive programming problems or system pre-processing where the query set is fixed.
When to use Binary Lifting: When queries arrive interactively or one-by-one (online). Its $O (lo g n)$ query time is fast and predictable, and it supports other useful operations like jumping $k$ levels up the tree. The preprocessing is also straightforward.

The choice hinges on the problem's constraints. Tarjan's is asymptotically superior for batched offline queries, while binary lifting offers greater flexibility for dynamic online scenarios.

Applications: From LCA to Tree Distance

The primary application of LCA is computing the distance between two nodes in a weighted tree. The distance $d i s t (u, v)$ can be derived using the formula: $d i s t (u, v) = d e pt h [u] + d e pt h [v] - 2 \times d e pt h [L C A (u, v)]$ where depth[x] is the sum of edge weights from the root to $x$ . After an $O (n)$ DFS to compute node depths and an offline batch of LCA queries via Tarjan's algorithm, you can compute the distance for all query pairs in nearly constant time each. This pattern is powerful in network routing, phylogenetic tree analysis, and solving graph problems reduced to trees (e.g., on a minimum spanning tree).

Common Pitfalls

Incorrect Union-Find Merging Order: A frequent mistake is unioning a node with its parent before processing all its queries. You must process all children and, crucially, answer all queries for the current node before performing the union on backtrack. Unioning too early makes the node's set representative point to an ancestor that is not the true LCA.

Correction: Strictly follow the DFS post-order: Visit node, process its immediate queries, recursively process children, then union with parent.

Mishandling Duplicate or Self-Queries: Not properly handling a query like LCA(u, u) or the same query appearing multiple times in the list can cause errors or incorrect output formatting.

Correction: During query setup, ensure you store query partners in a way that allows easy retrieval of the answer. For LCA(u, u), the answer is trivially $u$ , and this can be handled as a special case during initialization.

Assuming an Undirected Graph: The algorithm requires a rooted tree. Providing it with a general undirected graph without first defining a root and conducting a tree-forming DFS will lead to incorrect results or infinite recursion.

Correction: Always preprocess your graph input to build a rooted tree (or forest) using a DFS, ignoring back edges if the graph is not a perfect tree.

Summary

Tarjan's Offline LCA Algorithm answers a batch of LCA queries efficiently by performing a single DFS and using a Union-Find structure to track the deepest processed ancestor of visited nodes.
The algorithm achieves an amortized time complexity of $O (α (n))$ per query, effectively constant, after $O (n + Q)$ setup, making it optimal for offline batch processing.
Its key differentiator from online methods like binary lifting is the requirement that all queries are known beforehand; for interactive queries, an online method is necessary.
The primary use case is computing tree node distances using the formula $d i s t (u, v) = d e pt h [u] + d e pt h [v] - 2 \times d e pt h [L C A (u, v)]$ .
Successful implementation hinges on correctly ordering operations: answer queries for the current node before unioning it with its parent during DFS backtracking.

Algo: Tarjan's Offline LCA Algorithm

Algo: Tarjan's Offline LCA Algorithm

The Core Idea: Offline Processing with Union-Find

The Algorithm: A Step-by-Step Walkthrough

Analyzing the Complexity: O(n α(n)) Amortized Per Query

Tarjan vs. Online LCA Methods

Applications: From LCA to Tree Distance

Common Pitfalls

Summary

Write better notes with AI