Algorithms: Sorting and Searching
Algorithms: Sorting and Searching
Sorting and searching sit at the core of practical computing. Whether you are preparing records for reporting, deduplicating data, building an autocomplete feature, or optimizing a database query plan, the choice of algorithm shapes both performance and reliability. The best approach is rarely “the fastest in theory.” It depends on data size, how the data is distributed, memory constraints, whether you need stability, and how often you will repeat the operation.
This article covers classic sorting and searching techniques, explains their time and space complexity, and offers selection criteria grounded in real use cases.
Why complexity analysis matters
Algorithmic complexity helps you reason about how runtime grows as input size increases.
- Time complexity estimates the number of operations. Common categories include , , and .
- Space complexity estimates additional memory required beyond the input.
For sorting, lower bounds also matter. Any comparison-based sort that relies solely on comparisons between elements has a worst-case lower bound of . To do better than that, you need assumptions about the data (for example, integer keys in a bounded range), enabling linear-time sorts.
Comparison-based sorting algorithms
Comparison sorts work on any data type with a comparator, which is why they dominate general-purpose libraries.
Quicksort
Idea: Choose a pivot, partition the array into elements less than and greater than the pivot, then recursively sort partitions.
- Average time:
- Worst-case time: (avoidable in practice with randomized pivots or introspective strategies)
- Extra space: Often due to recursion (in-place partitioning is typical)
- Stability: Not stable in its common in-place form
When to use: Large in-memory arrays where constant factors matter and typical performance is excellent. Many real systems use variants that mitigate worst cases (randomization, median-of-three, or switching to heapsort when recursion goes too deep).
Mergesort
Idea: Divide the array into halves, sort each half, then merge the sorted halves.
- Time: in best, average, and worst case
- Extra space: Typically for merging
- Stability: Stable
When to use: When stability matters (for example, sorting by last name then by first name using multi-pass sorts), or when working with linked lists. Mergesort also underpins external sorting where data does not fit in memory, because merging sorted runs from disk is efficient.
Heapsort
Idea: Build a heap, repeatedly extract the maximum (or minimum) to produce a sorted sequence.
- Time: worst case
- Extra space: auxiliary (in-place)
- Stability: Not stable
When to use: When you need a guaranteed worst case and must keep extra memory low. In practice, heapsort’s constant factors and cache behavior can be less favorable than quicksort, but its predictability is useful.
Insertion sort (and why it still matters)
Idea: Build a sorted prefix one item at a time by inserting the next element into its correct position.
- Best case: for already-sorted data
- Average/worst:
- Extra space:
- Stability: Stable
When to use: Small arrays or nearly-sorted data. Many high-performance sorts switch to insertion sort for small partitions because it is simple and fast when is small.
Linear-time sorting algorithms (non-comparison sorts)
Linear-time sorts achieve under specific assumptions about keys.
Counting sort
Idea: Count occurrences of each key in a known range, then reconstruct output.
- Time: where is the key range size
- Extra space: in typical stable implementations
- Stability: Can be stable
When to use: Integer keys with a small, known range, such as ages (0–120), ASCII bytes (0–255), or categorical codes. It is common in preprocessing pipelines and as a building block for radix sort.
Selection criterion: Counting sort is attractive when is not much larger than . If the key range is huge (for example, 32-bit IDs with sparse values), the memory cost becomes impractical.
Radix sort
Idea: Sort keys digit-by-digit (least significant digit or most significant digit) using a stable sub-sort such as counting sort.
- Time: where is the number of digits/passes and is the base range per digit
- Extra space: Often per pass
- Stability: Typically stable (depends on implementation)
When to use: Fixed-width integers, timestamps, or strings with bounded length where you can process characters or bytes. In systems sorting large volumes of 32-bit or 64-bit integers, radix sort can outperform comparison sorts because it avoids repeated comparisons.
Bucket sort (distribution-based)
Idea: Distribute elements into buckets based on value ranges, sort each bucket, then concatenate.
- Average time: Often near with uniform distribution assumptions
- Worst-case: Can degrade (if everything lands in one bucket)
- Extra space: Depends on bucket structure
- Stability: Possible, depends on per-bucket sorting
When to use: Real-valued data that is roughly uniformly distributed over a known interval, such as normalized scores. Bucket sort is as much about data distribution as it is about the algorithm.
Searching algorithms and how sorting changes the problem
Searching is tightly linked to sorting. If you only search once, a linear scan may be best. If you search many times, sorting once and using faster queries often wins.
Linear search
- Time:
- Works on: Unsorted data
- Best when: One-off lookups, small datasets, or when data changes constantly
Linear search is also common when the search condition is complex (multiple fields, fuzzy matching) and indexing is not available.
Binary search
Idea: On a sorted array, compare to the middle element and eliminate half the search space each step.
- Time:
- Works on: Sorted, indexable sequences (arrays)
- Variants: Find exact match, lower bound (first position key), upper bound (first position key)
Binary search is deceptively subtle in edge cases, especially with duplicates. In practice, “lower bound” and “upper bound” are more useful than “find any match” because they support range queries and counting occurrences.
Trade-off: Sorting costs time. If sorting takes and you perform searches, total cost is roughly . Compare that to for repeated linear scans. Once is more than a handful for non-trivial , sorting plus binary search usually wins.
Selection: finding the k-th element without fully sorting
Sometimes you do not need a fully sorted list. You need the median, the top 10 values, or the 95th percentile.
Selection via partitioning (Quickselect)
Idea: Similar to quicksort partitioning, but recurse into only the side containing the k-th element.
- Average time:
- Worst-case time: (rare with good pivot strategies)
- Extra space: Often auxiliary with iterative/in-place approaches
When to use: Computing medians, percentiles, or top-k thresholds efficiently. For example, if you are building a dashboard that shows the 100 highest latencies out of millions, selection avoids the cost of sorting everything.
Heap-based selection for top-k
Maintaining a min-heap of size while scanning elements yields:
- Time:
- Space:
This is often a strong choice when and you want the top-k elements, not just the k-th cutoff.
Practical criteria for choosing the right algorithm
Start with the nature of your data
- Arbitrary objects with a comparator: Prefer comparison sorts.
- Integers with small bounded ranges: Counting sort or radix sort can be superior.
- Nearly sorted data: Insertion sort shines, and hybrid algorithms benefit from it.
Consider stability and correctness needs
- If equal keys must preserve input order (common in multi-key sorting), choose a stable sort such as mergesort or stable variants provided by libraries.
- If worst-case guarantees matter (latency-sensitive systems), ensure your sort cannot degrade to without protection.
Account for memory and locality
- Mergesort’s extra space can be a deal-breaker in constrained environments.
- In-place algorithms reduce memory but may be less stable and sometimes less cache-friendly.
Optimize for the full workflow, not a single step
- If you will run many searches, sorting once and using binary search is often the right architecture.
- If you only need a percentile or top-k, use selection rather than full sorting.
Closing perspective
Sorting and searching are not “solved problems” so much as a set of trade