Greedy: Activity Selection and Interval Scheduling

Efficiently scheduling limited resources is a cornerstone of engineering, from allocating CPU time to booking conference rooms. The activity selection problem provides a foundational and elegant algorithmic strategy for choosing the maximum number of non-conflicting tasks. By understanding its greedy solution, you gain a powerful tool for optimization and a clear window into how simple, local choices can lead to globally optimal results.

Problem Statement and Greedy Intuition

Formally, you are given a set of $n$ activities. Each activity $i$ has a start time $s_{i}$ and a finish time $f_{i}$ , defining its interval $[s_{i}, f_{i})$ on a timeline. Two activities $i$ and $j$ are compatible (non-overlapping) if their intervals do not intersect—that is, if $f_{i} \leq s_{j}$ or $f_{j} \leq s_{i}$ . The goal is to select a subset of mutually compatible activities of maximum size (i.e., the greatest number of activities).

A naive approach would be to check all possible subsets, an exponential-time strategy. The efficient solution is a greedy algorithm, which builds a solution piece by piece, making the locally optimal choice at each step with the hope of finding a global optimum. The key insight is to always select the activity that finishes the earliest among the remaining compatible options. This frees up the resource as soon as possible, maximizing the time available for subsequent activities.

The Greedy Choice Property and Algorithm

The optimal greedy strategy for the unweighted, maximum-count problem is to repeatedly select the activity with the earliest finish time that does not conflict with previously chosen activities. This is known as the greedy choice property. The algorithm proceeds as follows:

Sort all activities by their finish time in ascending order: $f_{1} \leq f_{2} \leq ... \leq f_{n}$ .
Initialize the solution set $A$ with the first activity (the one that finishes earliest, activity 1).
Set a variable last_finish = $f_{1}$ .
Iterate through the remaining sorted activities (from $i = 2$ to $n$ ):

If the start time $s_{i}$ of the current activity is greater than or equal to last_finish, it is compatible.
Add activity $i$ to solution set $A$ .
Update last_finish = $f_{i}$ .

This process yields a maximal set of non-overlapping activities. The sorting step ensures we always consider the next earliest-finishing candidate, which is the core of the greedy heuristic.

Proof of Optimality

It's not enough for an algorithm to seem correct; you must be able to prove it. The proof for this greedy algorithm typically uses an exchange argument. The goal is to show that for any optimal solution $O$ , you can modify it to include the greedy choice (the earliest-finishing activity, call it $g$ ) without reducing the number of activities, proving a greedy solution exists that is at least as good as $O$ .

Let $g$ be the first activity chosen by the greedy algorithm (the earliest finishing activity overall).
Let $O$ be any optimal solution. If $O$ already contains $g$ , we are done.
If $O$ does not contain $g$ , consider the first activity in $O$ , call it $k$ . Since activities are sorted by finish time, we know $f_{g} \leq f_{k}$ .
Now, construct a new solution $O^{'} = (O ∖ {k}) \cup {g}$ . Activity $g$ finishes at or before $k$ , so it cannot conflict with any later activity in $O$ . Therefore, $O^{'}$ is also a set of compatible activities.
$O^{'}$ has the same number of activities as the optimal set $O$ . Thus, an optimal solution exists that includes the greedy choice $g$ .

You can then apply this argument recursively to the remaining subproblem (all activities starting after $f_{g}$ ), proving that each greedy choice leads to a globally optimal schedule.

Implementation and Complexity Analysis

A practical implementation focuses on the $O (n lo g n)$ time complexity. The dominant cost is the initial sort. The subsequent single pass through the sorted list runs in $O (n)$ time.

Here is a step-by-step breakdown in pseudocode:

ActivitySelection(s[], f[], n):
    1. Create an array of activities, each with s[i] and f[i].
    2. Sort this array by f[i] in ascending order.
    3. Initialize solution_list = [first activity]
    4. last_finish = f[first activity]
    5. For i = 1 to n-1:
         a. If s[current activity] >= last_finish:
                solution_list.append(current activity)
                last_finish = f[current activity]
    6. Return solution_list

The space complexity is $O (n)$ for storing the input and solution, or $O (1)$ additional space if you perform the selection in-place during the iteration.

Extension to Weighted Interval Scheduling

The classic greedy algorithm fails when activities have different weights or values (e.g., profit, priority). The problem becomes: select a set of non-overlapping activities that maximizes the total sum of weights. Selecting the earliest-finishing, low-value activity might block a later, extremely high-value one.

This weighted interval scheduling problem requires a more powerful technique: dynamic programming (DP). The DP approach breaks the problem into overlapping subproblems. You sort activities by finish time and define an array $d p [i]$ to represent the maximum weight achievable using the first $i$ activities.

The recurrence relation is: $d p [i] = max (d p [i - 1], w_{i} + d p [p (i)])$ where $w_{i}$ is the weight of activity $i$ , and $p (i)$ is the index of the last activity that finishes before activity $i$ starts (its "compatible predecessor"). This formula represents the core choice: either you skip activity $i$ (taking $d p [i - 1]$ ), or you take it and add its weight to the best solution from all activities compatible with it ( $d p [p (i)]$ ). The solution is then found in $d p [n]$ , built in $O (n lo g n)$ time (due to sorting and binary search to find $p (i)$ ).

Common Pitfalls

Choosing the Earliest-Starting Activity: A common intuitive mistake is to select the activity that starts the earliest. Imagine a very long activity that starts first but blocks all others. The earliest-finish strategy is provably correct for maximizing count; earliest-start is not.

Correction: Always base the greedy choice on the finish time, not the start time, for the maximum-count problem.

Misapplying the Greedy Algorithm to Weighted Problems: Attempting to use the earliest-finish greedy heuristic on weighted intervals will often yield a suboptimal solution. A modified greedy choice (e.g., highest weight per unit time) also fails in general cases.

Correction: Recognize that weighted optimization requires a different paradigm. When values are involved, immediately consider dynamic programming as the correct approach.

Incorrectly Checking for Compatibility: When iterating, you must compare the start time of the current candidate activity with the finish time of the last chosen activity. Comparing with the immediately previous activity in the sorted list, regardless of whether it was selected, is an error.

Correction: Maintain a dedicated variable (e.g., last_finish) that tracks the finish time of the most recently added activity for comparison.

Overlooking the Sort Cost: Stating the algorithm runs in $O (n)$ time is incorrect. The greedy selection loop is linear, but the prerequisite sorting is $O (n lo g n)$ , which defines the overall asymptotic complexity.

Correction: Always account for the sorting step in your time analysis: $O (n lo g n)$ for sorting plus $O (n)$ for selection.

Summary

The activity selection problem aims to schedule the maximum number of non-overlapping activities. The optimal greedy strategy is to always select the compatible activity with the earliest finish time.
The algorithm's $O (n lo g n)$ complexity stems from an initial sort by finish time, followed by a single linear scan to build the schedule.
Its optimality can be rigorously proven using an exchange argument, demonstrating that the greedy choice can be part of an optimal solution.
This greedy approach does not work for the weighted interval scheduling problem, where activities have different values. This variant requires a dynamic programming solution with a recurrence relation based on compatible predecessors.
Key implementation pitfalls include choosing the wrong selection criterion (start time vs. finish time), misapplying the algorithm to weighted scenarios, and making errors in the compatibility check during iteration.

Greedy: Activity Selection and Interval Scheduling

Greedy: Activity Selection and Interval Scheduling

Problem Statement and Greedy Intuition

The Greedy Choice Property and Algorithm

Proof of Optimality

Implementation and Complexity Analysis

Extension to Weighted Interval Scheduling

Common Pitfalls

Summary

Write better notes with AI