← All posts
4 min read

Adding Running Totals and Cumulative Sums

Your transaction log has a row for every payment, refund, and fee. Each row has an amount. Your accountant wants a balance column - the running total after each transaction. Row 1 shows $500. Row 2 adds $200, so the balance is $700. Row 3 subtracts $50 for a fee, balance drops to $650. And so on for 10,000 rows.

In Excel, you'd put =B1 in C1 and =C1+B2 in C2, then drag it down. It works until someone inserts a row, sorts the data, or the file gets big enough that recalculation takes forever. The formula approach is brittle.

In ExploreMyData, running totals use a window function. No formulas to drag. No cell references to break.

Basic running total

Open the toolbar and select Window from the Transform group. Choose amount as the column. Select SUM as the function. Set the order by field to transaction_date. Name the result running_balance.

ExploreMyData generates:

SUM(amount) OVER(ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_balance

That ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the key part. It tells DuckDB: for each row, sum every amount from the first row up to and including this one. That's your running total.

transaction_date description amount running_balance
2024-01-03Payment received500500
2024-01-07Payment received200700
2024-01-12Processing fee-50650
2024-01-18Payment received8001450
2024-01-25Refund issued-1201330

SQL: SUM(amount) OVER(ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_balance

Why order matters

A running total without an order is meaningless. "The cumulative sum up to this row" only makes sense if there's a defined sequence. That's why the ORDER BY is required.

Usually you order by a date or timestamp. But it could be an ID column, a sequence number, or anything that defines the logical order of your data. If your transaction log has a transaction_id column that increments, that works too.

What if two transactions have the same date? DuckDB processes them in whatever order they appear. If the exact order within a single day matters, add a secondary sort column - like transaction_id - to break ties.

Running total per group

Sometimes you want separate running totals. Revenue accumulating per sales rep. Inventory moving per warehouse. Payments accumulating per customer.

In the Window operation, set the partition by field. If you partition by customer_id, each customer gets their own running total that starts from zero.

SUM(amount) OVER(PARTITION BY customer_id ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS customer_running_total

Customer A's running total accumulates independently from Customer B's. The partition creates a separate window for each group.

customer_id transaction_date amount customer_running_total
C1012024-01-05300300
C1012024-01-14150450
C1022024-01-08500500
C1022024-01-20200700
C1022024-01-283501050

Customer C101 and C102 each accumulate independently. Partitioning resets the running total to zero for each new group.

Cumulative count instead of sum

Running totals don't have to sum a monetary value. You might want a cumulative count - how many orders has this customer placed up to this date?

Use the Window operation with COUNT instead of SUM. The principle is identical. You get a column that starts at 1 and increments with each row (within the partition, if you set one).

COUNT(*) OVER(PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS order_number

Dealing with NULL amounts

If your amount column has NULL values, SUM skips them. This is usually what you want - a refund that hasn't been processed yet (NULL) shouldn't affect the running balance. But if NULLs represent zero-value transactions, fill them first using Fill Missing with a literal value of 0.

Practical uses

Running totals show up everywhere:

  • Revenue tracking: cumulative revenue by month for the fiscal year
  • Inventory: stock level after each shipment in and out
  • Bank reconciliation: running balance across deposits and withdrawals
  • Support tickets: cumulative open count over time

The pattern is the same. Pick the amount column, order by date, optionally partition by group. One pipeline step, and you've got a running total that works on any number of rows without dragging a formula.

Start building running totals →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData