Algorithms: Sorting and Searching

Sorting and searching sit at the core of practical computing. Whether you are preparing records for reporting, deduplicating data, building an autocomplete feature, or optimizing a database query plan, the choice of algorithm shapes both performance and reliability. The best approach is rarely “the fastest in theory.” It depends on data size, how the data is distributed, memory constraints, whether you need stability, and how often you will repeat the operation.

This article covers classic sorting and searching techniques, explains their time and space complexity, and offers selection criteria grounded in real use cases.

Why complexity analysis matters

Algorithmic complexity helps you reason about how runtime grows as input size $n$ increases.

Time complexity estimates the number of operations. Common categories include $O (n)$ , $O (n lo g n)$ , and $O (n^{2})$ .
Space complexity estimates additional memory required beyond the input.

For sorting, lower bounds also matter. Any comparison-based sort that relies solely on comparisons between elements has a worst-case lower bound of $Ω (n lo g n)$ . To do better than that, you need assumptions about the data (for example, integer keys in a bounded range), enabling linear-time sorts.

Comparison-based sorting algorithms

Comparison sorts work on any data type with a comparator, which is why they dominate general-purpose libraries.

Quicksort

Idea: Choose a pivot, partition the array into elements less than and greater than the pivot, then recursively sort partitions.

Average time: $O (n lo g n)$
Worst-case time: $O (n^{2})$ (avoidable in practice with randomized pivots or introspective strategies)
Extra space: Often $O (lo g n)$ due to recursion (in-place partitioning is typical)
Stability: Not stable in its common in-place form

When to use: Large in-memory arrays where constant factors matter and typical performance is excellent. Many real systems use variants that mitigate worst cases (randomization, median-of-three, or switching to heapsort when recursion goes too deep).

Mergesort

Idea: Divide the array into halves, sort each half, then merge the sorted halves.

Time: $O (n lo g n)$ in best, average, and worst case
Extra space: Typically $O (n)$ for merging
Stability: Stable

When to use: When stability matters (for example, sorting by last name then by first name using multi-pass sorts), or when working with linked lists. Mergesort also underpins external sorting where data does not fit in memory, because merging sorted runs from disk is efficient.

Heapsort

Idea: Build a heap, repeatedly extract the maximum (or minimum) to produce a sorted sequence.

Time: $O (n lo g n)$ worst case
Extra space: $O (1)$ auxiliary (in-place)
Stability: Not stable

When to use: When you need a guaranteed $O (n lo g n)$ worst case and must keep extra memory low. In practice, heapsort’s constant factors and cache behavior can be less favorable than quicksort, but its predictability is useful.

Insertion sort (and why it still matters)

Idea: Build a sorted prefix one item at a time by inserting the next element into its correct position.

Best case: $O (n)$ for already-sorted data
Average/worst: $O (n^{2})$
Extra space: $O (1)$
Stability: Stable

When to use: Small arrays or nearly-sorted data. Many high-performance sorts switch to insertion sort for small partitions because it is simple and fast when $n$ is small.

Linear-time sorting algorithms (non-comparison sorts)

Linear-time sorts achieve $O (n)$ under specific assumptions about keys.

Counting sort

Idea: Count occurrences of each key in a known range, then reconstruct output.

Time: $O (n + k)$ where $k$ is the key range size
Extra space: $O (n + k)$ in typical stable implementations
Stability: Can be stable

When to use: Integer keys with a small, known range, such as ages (0–120), ASCII bytes (0–255), or categorical codes. It is common in preprocessing pipelines and as a building block for radix sort.

Selection criterion: Counting sort is attractive when $k$ is not much larger than $n$ . If the key range is huge (for example, 32-bit IDs with sparse values), the memory cost becomes impractical.

Radix sort

Idea: Sort keys digit-by-digit (least significant digit or most significant digit) using a stable sub-sort such as counting sort.

Time: $O (d (n + k))$ where $d$ is the number of digits/passes and $k$ is the base range per digit
Extra space: Often $O (n + k)$ per pass
Stability: Typically stable (depends on implementation)

When to use: Fixed-width integers, timestamps, or strings with bounded length where you can process characters or bytes. In systems sorting large volumes of 32-bit or 64-bit integers, radix sort can outperform comparison sorts because it avoids repeated comparisons.

Bucket sort (distribution-based)

Idea: Distribute elements into buckets based on value ranges, sort each bucket, then concatenate.

Average time: Often near $O (n)$ with uniform distribution assumptions
Worst-case: Can degrade (if everything lands in one bucket)
Extra space: Depends on bucket structure
Stability: Possible, depends on per-bucket sorting

When to use: Real-valued data that is roughly uniformly distributed over a known interval, such as normalized scores. Bucket sort is as much about data distribution as it is about the algorithm.

Searching algorithms and how sorting changes the problem

Searching is tightly linked to sorting. If you only search once, a linear scan may be best. If you search many times, sorting once and using faster queries often wins.

Linear search

Time: $O (n)$
Works on: Unsorted data
Best when: One-off lookups, small datasets, or when data changes constantly

Linear search is also common when the search condition is complex (multiple fields, fuzzy matching) and indexing is not available.

Binary search

Idea: On a sorted array, compare to the middle element and eliminate half the search space each step.

Time: $O (lo g n)$
Works on: Sorted, indexable sequences (arrays)
Variants: Find exact match, lower bound (first position $\geq$ key), upper bound (first position $>$ key)

Binary search is deceptively subtle in edge cases, especially with duplicates. In practice, “lower bound” and “upper bound” are more useful than “find any match” because they support range queries and counting occurrences.

Trade-off: Sorting costs time. If sorting takes $O (n lo g n)$ and you perform $m$ searches, total cost is roughly $O (n lo g n + m lo g n)$ . Compare that to $O (mn)$ for repeated linear scans. Once $m$ is more than a handful for non-trivial $n$ , sorting plus binary search usually wins.

Selection: finding the k-th element without fully sorting

Sometimes you do not need a fully sorted list. You need the median, the top 10 values, or the 95th percentile.

Selection via partitioning (Quickselect)

Idea: Similar to quicksort partitioning, but recurse into only the side containing the k-th element.

Average time: $O (n)$
Worst-case time: $O (n^{2})$ (rare with good pivot strategies)
Extra space: Often $O (1)$ auxiliary with iterative/in-place approaches

When to use: Computing medians, percentiles, or top-k thresholds efficiently. For example, if you are building a dashboard that shows the 100 highest latencies out of millions, selection avoids the cost of sorting everything.

Heap-based selection for top-k

Maintaining a min-heap of size $k$ while scanning $n$ elements yields:

Time: $O (n lo g k)$
Space: $O (k)$

This is often a strong choice when $k ≪ n$ and you want the top-k elements, not just the k-th cutoff.

Practical criteria for choosing the right algorithm

Start with the nature of your data

Arbitrary objects with a comparator: Prefer comparison sorts.
Integers with small bounded ranges: Counting sort or radix sort can be superior.
Nearly sorted data: Insertion sort shines, and hybrid algorithms benefit from it.

Consider stability and correctness needs

If equal keys must preserve input order (common in multi-key sorting), choose a stable sort such as mergesort or stable variants provided by libraries.
If worst-case guarantees matter (latency-sensitive systems), ensure your sort cannot degrade to $O (n^{2})$ without protection.

Account for memory and locality

Mergesort’s $O (n)$ extra space can be a deal-breaker in constrained environments.
In-place algorithms reduce memory but may be less stable and sometimes less cache-friendly.

Optimize for the full workflow, not a single step

If you will run many searches, sorting once and using binary search is often the right architecture.
If you only need a percentile or top-k, use selection rather than full sorting.

Closing perspective

Sorting and searching are not “solved problems” so much as a set of trade

Algorithms: Sorting and Searching

Algorithms: Sorting and Searching

Why complexity analysis matters

Comparison-based sorting algorithms

Quicksort

Mergesort

Heapsort

Insertion sort (and why it still matters)

Linear-time sorting algorithms (non-comparison sorts)

Counting sort

Radix sort

Bucket sort (distribution-based)

Searching algorithms and how sorting changes the problem

Linear search

Binary search

Selection: finding the k-th element without fully sorting

Selection via partitioning (Quickselect)

Heap-based selection for top-k

Practical criteria for choosing the right algorithm

Start with the nature of your data

Consider stability and correctness needs

Account for memory and locality

Optimize for the full workflow, not a single step

Closing perspective

Write better notes with AI