Skip to content
Feb 27

SQL Window Functions: LAG, LEAD, Running Totals

MT
Mindli Team

AI-Generated Content

SQL Window Functions: LAG, LEAD, Running Totals

Mastering SQL Window Functions transforms how you analyze data by allowing you to perform calculations across sets of rows related to the current row, without collapsing them into a single output row. This capability is fundamental for time-series analysis, calculating running metrics, and making row-to-row comparisons—tasks that are clumsy or impossible with standard GROUP BY clauses. Understanding functions like LAG(), LEAD(), and aggregates with OVER() is key to unlocking sophisticated analytical queries directly within your database.

The Foundation: The OVER() Clause

Every window function operates over a window of rows, which is defined by the OVER() clause. This clause has three key parts that control the window's scope: PARTITION BY, ORDER BY, and a frame specification (ROWS BETWEEN or RANGE BETWEEN). The PARTITION BY divides your result set into groups, analogous to GROUP BY, but the rows remain separate. The ORDER BY within the window sorts the rows inside each partition, which is essential for defining sequence in time-series data or for cumulative calculations. Without an ORDER BY, the concept of "previous" or "next" row is undefined. This clause is the bedrock upon which all specific window functions are built.

Comparing Rows: LAG() and LEAD()

The LAG() and LEAD() functions provide direct access to data in other rows relative to the current one, perfect for calculating differences or trends over time.

LAG(column, offset) looks backward a specified number of rows from the current row. For instance, to calculate a month-over-month sales difference, you would retrieve the previous month's sales value for comparison.

SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS previous_month_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS revenue_change
FROM monthly_sales;

Conversely, LEAD(column, offset) looks forward. This is useful when you need to compare the current row to a future value, such as calculating the time between consecutive steps in a process log. The optional offset argument (default is 1) lets you look further ahead or behind. Both functions return NULL when there is no row at the specified offset, which you must handle gracefully, often with the COALESCE() function.

Calculating Running Totals and Cumulative Aggregates

A running total (or cumulative sum) is a classic application of window functions. You achieve this by combining an aggregate function like SUM() with an OVER() clause that includes an ORDER BY but no upper bound in its frame.

SELECT
    transaction_date,
    amount,
    SUM(amount) OVER (ORDER BY transaction_date) AS running_total
FROM transactions;

Here, SUM(amount) OVER (ORDER BY transaction_date) creates a default frame of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, meaning "sum all rows from the start of the partition up to and including the current row." This principle extends to other aggregates: use COUNT() for a cumulative count, AVG() for a cumulative average, and MIN() or MAX() to track a running minimum or maximum.

Advanced Frame Control: Moving Averages and Bounded Windows

While running totals use an unbounded preceding frame, you often need a bounded or sliding window. This is controlled by the ROWS BETWEEN frame clause. It allows you to define a window relative to the current row. A common use case is calculating a moving average, which smooths out short-term fluctuations.

SELECT
    date,
    daily_sales,
    AVG(daily_sales) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS weekly_moving_avg
FROM sales_data;

This query calculates a 7-day moving average (the current day plus the six preceding days). The frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is explicit. You could also use ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING for a centered average. Understanding ROWS (physical rows) versus RANGE (logical values) is crucial; RANGE is typically used with numeric or date ORDER BY columns and groups rows with identical values.

Complex Analytical Queries: Percent of Total and YoY Growth

Window functions excel at complex, multi-layered analysis in a single query pass. Two powerful patterns are percent of total and year-over-year (YoY) comparisons.

To calculate each row's contribution as a percent of its partition's total, you use a SUM() with an unbounded frame that is not ordered, or ordered over a different column.

SELECT
    department,
    employee_salary,
    employee_salary * 100.0 / SUM(employee_salary) OVER (PARTITION BY department) AS pct_of_dept_total
FROM employees;

The SUM() here is calculated over the entire partition (each department), giving you the denominator for the percentage.

For year-over-year growth, you combine LAG() with a partition reset at each year.

SELECT
    year,
    month,
    revenue,
    LAG(revenue, 12) OVER (PARTITION BY month ORDER BY year) AS revenue_same_month_last_year,
    (revenue - LAG(revenue, 12) OVER (PARTITION BY month ORDER BY year)) 
    / LAG(revenue, 12) OVER (PARTITION BY month ORDER BY year) * 100 AS yoy_growth_pct
FROM monthly_revenue;

By partitioning by month and ordering by year, the LAG(..., 12) looks exactly one year (12 rows) back within the same month's partition, enabling a clean January-to-January comparison.

Common Pitfalls

  1. Omitting ORDER BY for Sequential Functions: Using LAG(), LEAD(), or calculating a running total without an ORDER BY clause leads to non-deterministic results. The database has no defined sequence, so "previous" is meaningless. Correction: Always specify a logical ORDER BY (e.g., by date, ID) inside the OVER() clause for these operations.
  1. Misunderstanding the Default Frame: For aggregate window functions, the default frame changes based on the presence of ORDER BY. Without ORDER BY, the frame is the entire partition. With ORDER BY, it's RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This can lead to unexpectedly slow-running totals instead of full-partition sums. Correction: For a full-partition calculation (like percent of total) with an ORDER BY present, explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
  1. Confusing ROWS and RANGE: Using RANGE with a non-unique ORDER BY column can cause surprising behavior, as it groups all peer rows (rows with the same ORDER BY value). This often leads to identical aggregated values for all peers, which might not be the intended sliding window. Correction: For precise control over the number of rows in the frame, use ROWS BETWEEN. Use RANGE only when you intend to include all peer rows.
  1. Ignoring NULLs from LAG/LEAD: When LAG() or LEAD() references a non-existent row (e.g., the first row has no previous), it returns NULL. Performing arithmetic on this NULL results in NULL. Correction: Wrap the function in COALESCE(LAG(column, 1, 0) ...) to provide a default value (like 0), or handle the NULL explicitly in your calculation logic.

Summary

  • Window functions, defined by the OVER() clause, enable powerful row-relative and aggregate calculations without collapsing your result set. The PARTITION BY and ORDER BY sub-clauses are essential for controlling the window's scope and sequence.
  • LAG() and LEAD() are your primary tools for accessing values in preceding or following rows, forming the basis for calculating differences, deltas, and period-over-period comparisons.
  • Running totals and cumulative aggregates are computed using aggregates like SUM() with ORDER BY in the OVER() clause, which creates a default frame from the partition start to the current row.
  • For moving averages and bounded calculations, you must explicitly define the window frame using ROWS BETWEEN [N] PRECEDING AND [M] FOLLOWING to create a sliding window of specific size.
  • By combining these tools, you can efficiently solve advanced analytical problems like percent-of-total and year-over-year growth directly within a single, elegant SQL query.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.