February 27, 2026 by Arif Aslam 5 min read

Segmenting Customers by Purchase Behavior

Marketing sends you a message: "Can you tell us who our VIP customers are? And who's about to churn?" You have an orders table. Each row is a transaction - customer ID, order date, amount. What they actually need is one row per customer with a tier label: VIP, Regular, At Risk, Churned.

The gap between transactional data and customer segments is surprisingly common. You need to aggregate per customer (total spend, order count, last order date), compute recency, then apply business rules to assign tiers. Here's the full pipeline in ExploreMyData.

What you're starting with

A CSV with columns like customer_id, customer_name, email, order_date, order_total, and maybe product_category. Some customers appear once. Some appear 200 times. The data spans two years.

customer_id	customer_name	email	order_date	order_total
C-1042	Maria Chen	m.chen@acme.com	2024-02-11	320.00
C-1042	Maria Chen	m.chen@acme.com	2024-05-30	1,450.50
C-1042	Maria Chen	m.chen@acme.com	2025-01-08	890.00
C-2078	James Okafor	j.okafor@vertex.io	2023-11-15	75.00
C-3301	Sandra Reyes	sreyes@globex.com	2026-01-20	5,800.00

Same customer (C-1042) appears across multiple rows - one row per order, not per customer.

Step 1: Calculate total spend per customer

Open the toolbar and select Window from the Transform group. Configure it:

Function: SUM
Column: order_total
Partition by: customer_id

Click Apply. A new column appears - order_total_sum - with each customer's lifetime spend repeated on every row for that customer. The first card shows up in your pipeline sidebar.

Window functions are powerful here because they don't collapse rows. You still have every transaction, but now each row carries the customer's total context. That matters for the next steps.

Step 2: Count orders per customer

Add another Window step. This time:

Function: COUNT
Column: order_total (or any non-null column)
Partition by: customer_id

Now you have order_total_count - the purchase frequency for each customer. A customer who ordered once has 1; your power buyer might have 150. Second pipeline card added.

Step 3: Calculate days since last purchase

Recency is the key churn indicator. Select Date Difference from the Transform group. You need the difference between the customer's most recent order and today.

There's a subtlety here: order_date has every order, not just the last one. You could first use a Window MAX to get the latest date per customer, then compute the diff. Set it up:

First, add a Window Function step: MAX on order_date, partition by customer_id. This creates order_date_max.
Then use Date Difference between order_date_max and today's date, in days.

The result is a days_since_last_order column. A customer who ordered yesterday has 1. Someone who hasn't ordered in six months has 180+.

customer_id	order_date	order_total	order_total_sum	order_total_count	days_since_last_order
C-1042	2024-02-11	320.00	2,660.50	3	75
C-1042	2024-05-30	1,450.50	2,660.50	3	75
C-2078	2023-11-15	75.00	75.00	1	495
C-3301	2026-01-20	5,800.00	5,800.00	1	63

Window functions add lifetime stats to every row without collapsing the data - deduplication comes later.

Step 4: Assign customer tiers

Now the fun part. Select Add Column and name it customer_tier. Use a CASE WHEN expression that encodes your business rules:

VIP: total spend > $5,000 AND ordered in the last 90 days
Regular: ordered in the last 90 days but spend under $5,000
At Risk: last order was 90-180 days ago
Churned: no order in 180+ days

The expression looks something like:

CASE WHEN order_total_sum > 5000 AND days_since_last_order <= 90 THEN 'VIP' WHEN days_since_last_order <= 90 THEN 'Regular' WHEN days_since_last_order <= 180 THEN 'At Risk' ELSE 'Churned' END

Every row now has a tier. But you still have multiple rows per customer. That's what the next step fixes.

Step 5: Deduplicate to one row per customer

Select Remove Duplicates and choose customer_id as the deduplication column. This keeps only the first row per customer, which carries all the window-computed values you need.

Under the hood, this generates SELECT DISTINCT ON (customer_id) *. Your 40,000 transaction rows collapse to maybe 3,000 unique customers, each with their tier, total spend, order count, and recency.

The complete pipeline

Five cards in the pipeline sidebar:

Window SUM (total spend per customer)
Window COUNT (order frequency)
Window MAX + Date Diff (days since last order)
Add Column (CASE WHEN tier assignment)
Remove Duplicates (one row per customer)

You can now sort by customer_tier or filter to just the "At Risk" segment to hand off to the retention team. Or use Select Columns to trim down to just customer name, email, tier, and total spend before exporting.

The thresholds ($5,000, 90 days, 180 days) are just starting points. You can go back to the Add Column card, tweak the numbers, and the whole pipeline recalculates. No need to rebuild anything.

customer_name	email	customer_tier	order_total_sum	order_total_count	days_since_last_order
Sandra Reyes	sreyes@globex.com	VIP	5,800.00	1	63
Maria Chen	m.chen@acme.com	Regular	2,660.50	3	75
Priya Nair	p.nair@starco.net	At Risk	1,120.00	2	142
James Okafor	j.okafor@vertex.io	Churned	75.00	1	495

40,000 transactions collapsed to one row per customer - each with their tier, lifetime spend, order count, and recency.

Segment your customers now →

Segmenting Customers by Purchase Behavior

What you're starting with

Step 1: Calculate total spend per customer

Step 2: Count orders per customer

Step 3: Calculate days since last purchase

Step 4: Assign customer tiers

Step 5: Deduplicate to one row per customer

The complete pipeline

Try it yourself

Related tools

More from the blog

Building a Monthly Sales Report from Raw Transactions

Creating Conditional Values with CASE WHEN Logic

Computing Running Totals and Cumulative Sums