Skip to content
Mar 10

SQL Window Frames: ROWS vs RANGE vs GROUPS

MT
Mindli Team

AI-Generated Content

SQL Window Frames: ROWS vs RANGE vs GROUPS

Precise window functions are what separate basic SQL queries from powerful analytical ones. While the OVER() clause defines your dataset partition and ordering, the window frame specifies exactly which rows within that partition are used for each calculation. Misunderstanding the subtle but critical differences between ROWS, RANGE, and GROUPS can lead to calculations that are subtly wrong or catastrophically misleading, especially with duplicate values.

Understanding the Window Frame Specification

A window frame is a movable subset of rows within a partition, relative to the current row being processed. It is defined using a FRAME clause inside the OVER() clause. The most common syntax uses BETWEEN to set explicit start and end boundaries.

The key boundary keywords are:

  • UNBOUNDED PRECEDING: The first row of the partition.
  • n PRECEDING: A specified number of rows, values, or groups before the current row.
  • CURRENT ROW: The current row being evaluated.
  • n FOLLOWING: A specified number of rows, values, or groups after the current row.
  • UNBOUNDED FOLLOWING: The last row of the partition.

The choice of ROWS, RANGE, or GROUPS determines how the database interprets these boundaries, fundamentally changing the calculation's result.

ROWS: Frame by Physical Offset

The ROWS frame type operates on physical row offsets. It counts rows based purely on their position in the ordered partition, ignoring the actual values in the ORDER BY column. This makes it deterministic and straightforward.

Consider calculating a running total of daily sales. With ROWS, the frame "slides" one physical row at a time.

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total_rows
FROM sales;

If your data has unique sale_date values, this works perfectly. The frame for the third row includes only rows 1, 2, and 3. ROWS is ideal for operations where the order matters more than the values, like calculating a moving average over a specific number of preceding rows (e.g., a 7-day rolling average using ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). Its behavior is predictable because it is not affected by duplicate values.

RANGE: Frame by Logical Value Range

The RANGE frame type operates on logical value ranges based on the ORDER BY column. Instead of counting rows, it includes all rows whose ordering column's value falls within a specified numeric or date range relative to the current row's value.

This is powerful but leads to the most common pitfall. For example, if you want a running total that resets with each new value, RANGE might seem logical. However, its default behavior with CURRENT ROW is to include all peers—rows with the same value in the ORDER BY column.

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total_range
FROM sales;

Imagine two transactions on 2023-10-01 for 150, and one on 2023-10-02 for 250 (150). The calculation for 2023-10-02 will then add its amount to the total of all prior values, resulting in $450. RANGE is most useful with intervals, like summing all sales in the last 30 days: RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW.

GROUPS: Frame by Peer Groups

Introduced in SQL:2011 and supported by modern databases like PostgreSQL, the GROUPS frame type is a hybrid. It counts groups of peers (rows with identical ORDER BY values) rather than individual rows (ROWS) or value ranges (RANGE).

This makes it exceptionally useful for handling duplicates while maintaining a clear, countable progression. It treats each set of duplicate values as a single unit.

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total_groups
FROM sales;

Using the same duplicate date example, GROUPS treats the two rows for 2023-10-01 as a single peer group. For any row within that first date's group, the frame includes the entire first group. The running total for both rows on 2023-10-01 will be 450. It effectively provides a clean "running total by distinct ordered value."

Default Frame Behavior and Choosing the Right Type

You cannot make an informed choice without understanding the default. When you use an ORDER BY clause inside OVER() without an explicit FRAME clause, the SQL standard defines a default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This is a major source of errors, as developers often assume the intuitive row-by-row behavior of ROWS.

Choosing the correct frame type depends on your analytical question:

  • Use ROWS when you need a strict physical count of rows. Examples include rank, row number, or any N-row moving window (e.g., "the last 3 transactions").
  • Use RANGE when you need to include all rows within a specific value boundary relative to the current row. This is best for time-series intervals (e.g., "all rows within the last 30 minutes") where exact row count is less important than the logical range.
  • Use GROUPS when you want to handle duplicate ORDER BY values as single units. It's perfect for problems like "calculate a running total per distinct date" or "find the average salary per grade, including all employees in the same grade."

For aggregate functions like SUM() and AVG(), the frame is crucial. For ranking functions like ROW_NUMBER() and RANK(), the frame clause is often prohibited or irrelevant, as they compute based on the entire partition defined by ORDER BY.

Common Pitfalls

  1. Assuming Default is ROWS Behavior: The most frequent mistake is writing SUM(...) OVER (ORDER BY date) and expecting a strict row-by-row running total. With duplicate dates, you will get an inflated sum for all duplicate rows at each step. Correction: Always consider if you need ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for physical row progression.
  1. Using RANGE with Non-Numeric/Date Types: The RANGE frame requires the ORDER BY column's data type to support addition/subtraction (like numbers, dates, or intervals). Using it with ORDER BY customer_name will cause an error. Correction: Use ROWS or GROUPS for ordered categorical data.
  1. Ignoring Performance Implications: RANGE can be significantly slower than ROWS because it must evaluate value ranges, which may require a full scan of the partition for each row, rather than a simple offset. GROUPS can also have overhead due to peer group identification. Correction: For high-performance queries over large datasets where physical offset is sufficient, prefer ROWS.
  1. Misunderstanding CURRENT ROW with Duplicates: As detailed, CURRENT ROW in a RANGE frame includes all peer rows. If you intend to include only the single current physical row in a sum, you must use ROWS. Correction: Be explicit about whether "current" means the current value (use RANGE/GROUPS) or the current row position (use ROWS).

Summary

  • The window frame (ROWS, RANGE, GROUPS) within an OVER() clause precisely defines which subset of rows are inputs to a window function for each calculation.
  • ROWS uses physical row offsets and is predictable, ideal for N-row sliding windows and sequential operations.
  • RANGE uses logical value ranges from the ORDER BY column and includes all peer rows (duplicates) by default with CURRENT ROW; it's best for time-based intervals.
  • GROUPS treats each set of duplicate ORDER BY values as a single unit to count, providing a clean middle ground for handling duplicates.
  • The default frame when using ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which often causes unexpected results with duplicate values.
  • Always consciously select your frame type based on whether your analysis depends on physical sequence, logical value ranges, or distinct peer groups to ensure calculations are accurate and meaningful.

Write better notes with AI

Mindli helps you capture, organize, and master any subject with AI-powered summaries and flashcards.