Segmenting Customers by Purchase Behavior
Marketing sends you a message: "Can you tell us who our VIP customers are? And who's about to churn?" You have an orders table. Each row is a transaction - customer ID, order date, amount. What they actually need is one row per customer with a tier label: VIP, Regular, At Risk, Churned.
The gap between transactional data and customer segments is surprisingly common. You need to aggregate per customer (total spend, order count, last order date), compute recency, then apply business rules to assign tiers. Here's the full pipeline in ExploreMyData.
What you're starting with
A CSV with columns like customer_id, customer_name, email, order_date, order_total, and maybe product_category. Some customers appear once. Some appear 200 times. The data spans two years.
| customer_id | customer_name | order_date | order_total | |
|---|---|---|---|---|
| C-1042 | Maria Chen | m.chen@acme.com | 2024-02-11 | 320.00 |
| C-1042 | Maria Chen | m.chen@acme.com | 2024-05-30 | 1,450.50 |
| C-1042 | Maria Chen | m.chen@acme.com | 2025-01-08 | 890.00 |
| C-2078 | James Okafor | j.okafor@vertex.io | 2023-11-15 | 75.00 |
| C-3301 | Sandra Reyes | sreyes@globex.com | 2026-01-20 | 5,800.00 |
Same customer (C-1042) appears across multiple rows - one row per order, not per customer.
Step 1: Calculate total spend per customer
Open the toolbar and select Window from the Transform group. Configure it:
- Function: SUM
- Column:
order_total - Partition by:
customer_id
Click Apply. A new column appears - order_total_sum - with each customer's lifetime spend repeated on every row for that customer. The first card shows up in your pipeline sidebar.
Window functions are powerful here because they don't collapse rows. You still have every transaction, but now each row carries the customer's total context. That matters for the next steps.
Step 2: Count orders per customer
Add another Window step. This time:
- Function: COUNT
- Column:
order_total(or any non-null column) - Partition by:
customer_id
Now you have order_total_count - the purchase frequency for each customer. A customer who ordered once has 1; your power buyer might have 150. Second pipeline card added.
Step 3: Calculate days since last purchase
Recency is the key churn indicator. Select Date Diff from the Transform group. You need the difference between the customer's most recent order and today.
There's a subtlety here: order_date has every order, not just the last one. You could first use a Window MAX to get the latest date per customer, then compute the diff. Set it up:
- First, add a Window step: MAX on
order_date, partition bycustomer_id. This createsorder_date_max. - Then use Date Diff between
order_date_maxand today's date, in days.
The result is a days_since_last_order column. A customer who ordered yesterday has 1. Someone who hasn't ordered in six months has 180+.
| customer_id | order_date | order_total | order_total_sum | order_total_count | days_since_last_order |
|---|---|---|---|---|---|
| C-1042 | 2024-02-11 | 320.00 | 2,660.50 | 3 | 75 |
| C-1042 | 2024-05-30 | 1,450.50 | 2,660.50 | 3 | 75 |
| C-2078 | 2023-11-15 | 75.00 | 75.00 | 1 | 495 |
| C-3301 | 2026-01-20 | 5,800.00 | 5,800.00 | 1 | 63 |
Window functions add lifetime stats to every row without collapsing the data - deduplication comes later.
Step 4: Assign customer tiers
Now the fun part. Select Add Column and name it customer_tier. Use a CASE WHEN expression that encodes your business rules:
- VIP: total spend > $5,000 AND ordered in the last 90 days
- Regular: ordered in the last 90 days but spend under $5,000
- At Risk: last order was 90-180 days ago
- Churned: no order in 180+ days
The expression looks something like:
CASE WHEN order_total_sum > 5000 AND days_since_last_order <= 90 THEN 'VIP' WHEN days_since_last_order <= 90 THEN 'Regular' WHEN days_since_last_order <= 180 THEN 'At Risk' ELSE 'Churned' END
Every row now has a tier. But you still have multiple rows per customer. That's what the next step fixes.
Step 5: Deduplicate to one row per customer
Select Remove Duplicates and choose customer_id as the deduplication column. This keeps only the first row per customer, which carries all the window-computed values you need.
Under the hood, this generates SELECT DISTINCT ON (customer_id) *. Your 40,000 transaction rows collapse to maybe 3,000 unique customers, each with their tier, total spend, order count, and recency.
The complete pipeline
Five cards in the pipeline sidebar:
- Window SUM (total spend per customer)
- Window COUNT (order frequency)
- Window MAX + Date Diff (days since last order)
- Add Column (CASE WHEN tier assignment)
- Remove Duplicates (one row per customer)
You can now sort by customer_tier or filter to just the "At Risk" segment to hand off to the retention team. Or use Select Columns to trim down to just customer name, email, tier, and total spend before exporting.
The thresholds ($5,000, 90 days, 180 days) are just starting points. You can go back to the Add Column card, tweak the numbers, and the whole pipeline recalculates. No need to rebuild anything.
| customer_name | customer_tier | order_total_sum | order_total_count | days_since_last_order | |
|---|---|---|---|---|---|
| Sandra Reyes | sreyes@globex.com | VIP | 5,800.00 | 1 | 63 |
| Maria Chen | m.chen@acme.com | Regular | 2,660.50 | 3 | 75 |
| Priya Nair | p.nair@starco.net | At Risk | 1,120.00 | 2 | 142 |
| James Okafor | j.okafor@vertex.io | Churned | 75.00 | 1 | 495 |
40,000 transactions collapsed to one row per customer - each with their tier, lifetime spend, order count, and recency.