SQL Window Function SUM and AVG OVER

Mastering SQL window functions transforms you from someone who merely queries data into someone who performs sophisticated, context-aware analysis directly within the database. At the core of this power are the SUM and AVG functions used with the OVER() clause, which allow you to calculate running totals, moving averages, and partition-level summaries without collapsing your result set. This capability is indispensable for financial analytics, operational reporting, and any task requiring calculations relative to a row's position in a sorted set.

The Foundation: From Plain Aggregates to Window Aggregates

Before window functions, calculating something like a running total was cumbersome, often requiring self-joins or correlated subqueries. A standard SUM or AVG is an aggregate function that collapses multiple rows into a single summary row. A window function, conversely, performs a calculation across a set of table rows that are somehow related to the current row, while still returning every individual row.

The key is the OVER() clause, which defines this "window" of rows. The simplest form uses PARTITION BY to create independent groups for calculation. For example, SUM(sales) OVER(PARTITION BY region) would add a column to your result showing the total sales for each region on every row belonging to that region, preserving the detail.

Calculating Cumulative Totals with ORDER BY

Introducing ORDER BY within the OVER() clause is what enables ordered calculations like running totals or cumulative sums. The syntax SUM(column) OVER(ORDER BY sort_column) tells the database: "For the current row, sum the value from the first row in the sorted set up to and including this row."

Consider a daily sales table. A standard query might show daily revenue. To understand the running monthly total, you would write:

SELECT
  sale_date,
  daily_revenue,
  SUM(daily_revenue) OVER(ORDER BY sale_date) AS running_total
FROM sales
WHERE sale_date BETWEEN '2023-10-01' AND '2023-10-31';

For each row, the running_total column is the sum of daily_revenue for all rows with a sale_date less than or equal to the current row's date. This is a cumulative sum. You can combine PARTITION BY and ORDER BY to get running totals per group, such as a cumulative sum of sales per salesperson.

Mastering Frame Specification: ROWS BETWEEN

The default behavior when using ORDER BY in a window function is to use a frame of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. However, for precise control, especially with moving averages, you must explicitly define the frame specification using ROWS BETWEEN or RANGE BETWEEN.

The ROWS clause defines the window frame in terms of physical row offsets. This is ideal for moving averages. For a 3-day simple moving average of revenue, you would specify a frame that includes the previous row, the current row, and the next row:

SELECT
  sale_date,
  daily_revenue,
  AVG(daily_revenue) OVER(
    ORDER BY sale_date
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS three_day_moving_avg
FROM sales;

The keywords UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING refer to the first and last row in the partition, respectively. A common frame for a running total from the very start is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is more explicit than the default.

ROWS vs. RANGE and Practical Implications

While ROWS works with physical row offsets, RANGE works with logical value offsets based on the ORDER BY column. This is a critical distinction. If you have duplicate values in your ORDER BY column, RANGE will include all peers (rows with the same value) in the calculation.

For example, if two rows have the same sale date, AVG(...) OVER(ORDER BY sale_date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) would treat them as peers and average over an identical cumulative set for both. The ROWS version would give a different, row-by-row progressing average. In practice, ROWS BETWEEN is more predictable for time-series data and is generally more performant, as RANGE may require a sort. Use RANGE only when you intentionally need to include peers.

Applied Analytics: Financial and Operational Scenarios

These functions shine in real-world analytics. In finance, you can calculate a running account balance from a ledger of credits and debits, or analyze a 50-day moving average of a stock price for trend identification. In operations, you might track the cumulative units produced against a daily target or compute a 7-day moving average of website visitors to smooth out weekly patterns.

A powerful pattern is combining detail and aggregates. You can show individual transaction amounts alongside the customer's lifetime total spend (SUM(amount) OVER(PARTITION BY customer_id ORDER BY transaction_date)), or display a daily error count next to the rolling weekly average to spot deviations. This ability to place aggregated context directly beside granular data is the unique advantage of window functions.

Common Pitfalls

Confusing ORDER BY in OVER() with ORDER BY for the result set: The ORDER BY inside OVER() controls only the window calculation order. The final query may have a different ORDER BY clause at its end to sort the displayed results. Mixing these up yields incorrect running totals.

Correction: Always verify the ORDER BY in your OVER() clause correctly defines the sequence for the calculation (e.g., chronological order for a running sum). Use a separate ORDER BY at the query level for presentation.

Assuming the default frame is always ROWS BETWEEN: When ORDER BY is used without a frame clause, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. As discussed, RANGE can lead to unexpected results with duplicate values. For predictable, performant row-by-row progression, explicitly use ROWS BETWEEN.

Correction: Make your frame specification explicit. Write ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW instead of relying on the implicit default.

Omitting PARTITION BY when needed, leading to cross-contamination: Without PARTITION BY, the window spans the entire result set. Calculating a running total per employee without partitioning will sum all employees together, which is rarely the goal.

Correction: Carefully consider the grouping level. A running total per employee requires PARTITION BY employee_id before the ORDER BY clause within the OVER().

Performance neglect with large windows: Using RANGE or frames like UNBOUNDED FOLLOWING on massive datasets can cause significant performance overhead, as the database must manage large in-memory frames or perform sorts.

Correction: Use the most restrictive frame possible. Prefer ROWS over RANGE. For moving averages, use a fixed, narrow frame like ROWS BETWEEN 6 PRECEDING AND CURRENT ROW instead of an unbounded window where applicable.

Summary

The SUM() OVER() and AVG() OVER() functions compute aggregates over a defined window of rows while returning all detail rows, enabling powerful blended analysis.
Adding ORDER BY inside the OVER() clause allows for ordered calculations like cumulative sums and running totals, calculated from the start of the partition up to the current row.
Explicit frame specification with ROWS BETWEEN gives you precise control, enabling calculations like moving averages (e.g., ROWS BETWEEN 6 PRECEDING AND CURRENT ROW for a 7-day average).
Understand the critical difference: ROWS defines a frame by physical row offsets, while RANGE defines it by logical value offsets in the ORDER BY column, which includes duplicate values as peers.
These techniques are fundamental for practical analytics, allowing you to compute financial running balances, operational moving averages, and partition-level aggregates directly alongside transaction-level data in a single, efficient query.

SQL Window Function SUM and AVG OVER

SQL Window Function SUM and AVG OVER

The Foundation: From Plain Aggregates to Window Aggregates

Calculating Cumulative Totals with ORDER BY

Mastering Frame Specification: ROWS BETWEEN

ROWS vs. RANGE and Practical Implications

Applied Analytics: Financial and Operational Scenarios

Common Pitfalls

Summary

Write better notes with AI