← All posts
6 min read

Window Functions Without Writing SQL: A Visual Guide

Window functions are the most powerful thing in SQL that most people never learn. They let you rank rows, calculate running totals, and compare each row to the one before it - all without collapsing your data into groups. The problem is that the SQL syntax looks like it was designed to intimidate you.

You don't need to write that SQL. ExploreMyData has a Window Function operation that lets you pick a function type, choose your columns, and see the result immediately. This guide walks through every supported function with concrete before-and-after examples.

What makes window functions different

A regular aggregate like SUM(revenue) collapses all your rows into one number. A window function calculates the same thing but keeps every row intact. You get the aggregate value alongside the detail. That's the core idea.

In ExploreMyData, you configure a window function by setting four things: the function type, the column to operate on (for aggregations), an order-by column, and optionally a partition-by column. The result goes into a new output column you name.

ROW_NUMBER: a unique sequence for every row

ROW_NUMBER assigns 1, 2, 3, ... to each row based on the order you specify. Every row gets a unique number, even if there are ties.

Say you have a table of sales reps and their quarterly revenue:

reprevenuerow_num
Alice50,0001
Bob45,0002
Carol45,0003
Dan38,0004

Bob and Carol both have $45,000, but ROW_NUMBER doesn't care - it assigns 2 and 3. The exact ordering between tied rows depends on the database, but every row is guaranteed a unique number. This is useful when you need a simple sequential ID, or when you want to grab the "top 1 per group" later with a filter.

RANK vs DENSE_RANK: handling ties

RANK and DENSE_RANK both handle ties differently than ROW_NUMBER. Here's the same data with all three:

reprevenuerow_numberrankdense_rank
Alice50,000111
Bob45,000222
Carol45,000322
Dan38,000443

Notice the difference for Dan. With RANK, he's 4th because two people tied for 2nd, so 3rd place is skipped. With DENSE_RANK, he's 3rd because no ranks are skipped. Use RANK when you want Olympic-style rankings (gold, gold, skip, bronze). Use DENSE_RANK when you want contiguous numbers (1st, 2nd, 2nd, 3rd).

Window Function configuration - RANK

  • Function: rank
  • Order by: revenue - descending
  • Partition by: (none)
  • Output column name: rank

Generated SQL: RANK() OVER (ORDER BY revenue DESC) AS rank

PARTITION BY: rankings within groups

Without a partition, the window function operates on the entire table. With a partition, it restarts for each group. This is the key to "top N per category" analysis.

Imagine you have sales data across regions. You want the top rep in each region:

regionreprevenuerank_in_region
EastAlice50,0001
EastBob45,0002
WestCarol62,0001
WestDan38,0002

Set the partition-by column to "region" and order by revenue descending. The ranking restarts at 1 for each region. Then add a filter step to keep only rows where rank_in_region = 1 and you've got the top performer per region. If you've read the top-N analysis post, this is the window function version of that same idea - more flexible, more precise.

Running total: cumulative sums

A running total adds up values as you move down the rows. Order matters - you're accumulating row by row.

dateamountrunning_total
Jan 1100100
Jan 2250350
Jan 375425
Jan 4300725

In ExploreMyData, select "running_total" as the function type, pick the column to sum (e.g., "amount"), and set the order-by column (e.g., "date"). Add a partition-by column if you want running totals that reset per group - like cumulative revenue per sales rep or per product category. We covered a simpler version of this in the running totals post.

Window Function configuration - Running Total

  • Function: running_total
  • Column: amount
  • Order by: date - ascending
  • Partition by: (none)
  • Output column name: running_total

Generated SQL: SUM(amount) OVER (ORDER BY date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total

LAG and LEAD: looking at neighboring rows

LAG gives you the previous row's value. LEAD gives you the next row's value. These are essential for calculating differences between consecutive rows - like day-over-day change, growth rates, or time between events.

monthrevenueprev_revenue (LAG)next_revenue (LEAD)
Jan10,000NULL12,000
Feb12,00010,0009,500
Mar9,50012,00015,000
Apr15,0009,500NULL

The first row has NULL for LAG (no previous row exists) and the last row has NULL for LEAD (no next row exists). Once you have the previous value, you can add a calculated column to compute the difference: revenue - prev_revenue. Or compute percentage change: (revenue - prev_revenue) / prev_revenue * 100.

LAG and LEAD also respect PARTITION BY. If you partition by product, the "previous" value is the previous row within the same product, not the previous row in the whole table.

When to use which function

Here's a quick reference:

  • ROW_NUMBER - assign a unique sequence, "top 1 per group" patterns
  • RANK - rankings where ties share a position and skip the next (1, 2, 2, 4)
  • DENSE_RANK - rankings where ties share a position without skipping (1, 2, 2, 3)
  • Running total - cumulative sums over ordered data
  • LAG - compare each row to the previous one
  • LEAD - compare each row to the next one

All six are available in the Window Function operation. Pick the function, choose your columns, name the output, and apply. The generated SQL appears in the pipeline card so you can see exactly what's happening under the hood - which is also a decent way to learn window function syntax if you ever need it.

Try window functions on your data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData