Adding Running Totals and Cumulative Sums
Your transaction log has a row for every payment, refund, and fee. Each row has an amount. Your accountant wants a balance column - the running total after each transaction. Row 1 shows $500. Row 2 adds $200, so the balance is $700. Row 3 subtracts $50 for a fee, balance drops to $650. And so on for 10,000 rows.
In Excel, you'd put =B1 in C1 and
=C1+B2 in C2, then drag it down.
It works until someone inserts a row, sorts the data, or the file gets big enough that recalculation
takes forever. The formula approach is brittle.
In ExploreMyData, running totals use a window function. No formulas to drag. No cell references to break.
Basic running total
Open the toolbar and select Window from the Transform group.
Choose amount as the column.
Select SUM as the function. Set the
order by field to
transaction_date.
Name the result running_balance.
ExploreMyData generates:
SUM(amount) OVER(ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_balance
That ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is the key part. It tells DuckDB: for each row, sum every amount from the first row up to and including
this one. That's your running total.
| transaction_date | description | amount | running_balance |
|---|---|---|---|
| 2024-01-03 | Payment received | 500 | 500 |
| 2024-01-07 | Payment received | 200 | 700 |
| 2024-01-12 | Processing fee | -50 | 650 |
| 2024-01-18 | Payment received | 800 | 1450 |
| 2024-01-25 | Refund issued | -120 | 1330 |
SQL: SUM(amount) OVER(ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_balance
Why order matters
A running total without an order is meaningless. "The cumulative sum up to this row" only makes sense if there's a defined sequence. That's why the ORDER BY is required.
Usually you order by a date or timestamp. But it could be an ID column, a sequence number, or anything
that defines the logical order of your data. If your transaction log has a
transaction_id column that
increments, that works too.
What if two transactions have the same date? DuckDB processes them in whatever order they appear.
If the exact order within a single day matters, add a secondary sort column - like
transaction_id - to break ties.
Running total per group
Sometimes you want separate running totals. Revenue accumulating per sales rep. Inventory moving per warehouse. Payments accumulating per customer.
In the Window operation, set the partition by field.
If you partition by customer_id,
each customer gets their own running total that starts from zero.
SUM(amount) OVER(PARTITION BY customer_id ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS customer_running_total
Customer A's running total accumulates independently from Customer B's. The partition creates a separate window for each group.
| customer_id | transaction_date | amount | customer_running_total |
|---|---|---|---|
| C101 | 2024-01-05 | 300 | 300 |
| C101 | 2024-01-14 | 150 | 450 |
| C102 | 2024-01-08 | 500 | 500 |
| C102 | 2024-01-20 | 200 | 700 |
| C102 | 2024-01-28 | 350 | 1050 |
Customer C101 and C102 each accumulate independently. Partitioning resets the running total to zero for each new group.
Cumulative count instead of sum
Running totals don't have to sum a monetary value. You might want a cumulative count - how many orders has this customer placed up to this date?
Use the Window operation with COUNT instead of SUM. The principle is identical. You get a column that starts at 1 and increments with each row (within the partition, if you set one).
COUNT(*) OVER(PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS order_number
Dealing with NULL amounts
If your amount column has NULL values, SUM skips them. This is usually what you want - a refund that hasn't been processed yet (NULL) shouldn't affect the running balance. But if NULLs represent zero-value transactions, fill them first using Fill Missing with a literal value of 0.
Practical uses
Running totals show up everywhere:
- Revenue tracking: cumulative revenue by month for the fiscal year
- Inventory: stock level after each shipment in and out
- Bank reconciliation: running balance across deposits and withdrawals
- Support tickets: cumulative open count over time
The pattern is the same. Pick the amount column, order by date, optionally partition by group. One pipeline step, and you've got a running total that works on any number of rows without dragging a formula.