← All posts
5 min read

Segmenting Customers by Purchase Behavior

Marketing sends you a message: "Can you tell us who our VIP customers are? And who's about to churn?" You have an orders table. Each row is a transaction - customer ID, order date, amount. What they actually need is one row per customer with a tier label: VIP, Regular, At Risk, Churned.

The gap between transactional data and customer segments is surprisingly common. You need to aggregate per customer (total spend, order count, last order date), compute recency, then apply business rules to assign tiers. Here's the full pipeline in ExploreMyData.

What you're starting with

A CSV with columns like customer_id, customer_name, email, order_date, order_total, and maybe product_category. Some customers appear once. Some appear 200 times. The data spans two years.

customer_idcustomer_nameemailorder_dateorder_total
C-1042Maria Chenm.chen@acme.com2024-02-11320.00
C-1042Maria Chenm.chen@acme.com2024-05-301,450.50
C-1042Maria Chenm.chen@acme.com2025-01-08890.00
C-2078James Okaforj.okafor@vertex.io2023-11-1575.00
C-3301Sandra Reyessreyes@globex.com2026-01-205,800.00

Same customer (C-1042) appears across multiple rows - one row per order, not per customer.

Step 1: Calculate total spend per customer

Open the toolbar and select Window from the Transform group. Configure it:

  • Function: SUM
  • Column: order_total
  • Partition by: customer_id

Click Apply. A new column appears - order_total_sum - with each customer's lifetime spend repeated on every row for that customer. The first card shows up in your pipeline sidebar.

Window functions are powerful here because they don't collapse rows. You still have every transaction, but now each row carries the customer's total context. That matters for the next steps.

Step 2: Count orders per customer

Add another Window step. This time:

  • Function: COUNT
  • Column: order_total (or any non-null column)
  • Partition by: customer_id

Now you have order_total_count - the purchase frequency for each customer. A customer who ordered once has 1; your power buyer might have 150. Second pipeline card added.

Step 3: Calculate days since last purchase

Recency is the key churn indicator. Select Date Diff from the Transform group. You need the difference between the customer's most recent order and today.

There's a subtlety here: order_date has every order, not just the last one. You could first use a Window MAX to get the latest date per customer, then compute the diff. Set it up:

  • First, add a Window step: MAX on order_date, partition by customer_id. This creates order_date_max.
  • Then use Date Diff between order_date_max and today's date, in days.

The result is a days_since_last_order column. A customer who ordered yesterday has 1. Someone who hasn't ordered in six months has 180+.

customer_idorder_dateorder_totalorder_total_sumorder_total_countdays_since_last_order
C-10422024-02-11320.002,660.50375
C-10422024-05-301,450.502,660.50375
C-20782023-11-1575.0075.001495
C-33012026-01-205,800.005,800.00163

Window functions add lifetime stats to every row without collapsing the data - deduplication comes later.

Step 4: Assign customer tiers

Now the fun part. Select Add Column and name it customer_tier. Use a CASE WHEN expression that encodes your business rules:

  • VIP: total spend > $5,000 AND ordered in the last 90 days
  • Regular: ordered in the last 90 days but spend under $5,000
  • At Risk: last order was 90-180 days ago
  • Churned: no order in 180+ days

The expression looks something like:

CASE WHEN order_total_sum > 5000 AND days_since_last_order <= 90 THEN 'VIP' WHEN days_since_last_order <= 90 THEN 'Regular' WHEN days_since_last_order <= 180 THEN 'At Risk' ELSE 'Churned' END

Every row now has a tier. But you still have multiple rows per customer. That's what the next step fixes.

Step 5: Deduplicate to one row per customer

Select Remove Duplicates and choose customer_id as the deduplication column. This keeps only the first row per customer, which carries all the window-computed values you need.

Under the hood, this generates SELECT DISTINCT ON (customer_id) *. Your 40,000 transaction rows collapse to maybe 3,000 unique customers, each with their tier, total spend, order count, and recency.

The complete pipeline

Five cards in the pipeline sidebar:

  1. Window SUM (total spend per customer)
  2. Window COUNT (order frequency)
  3. Window MAX + Date Diff (days since last order)
  4. Add Column (CASE WHEN tier assignment)
  5. Remove Duplicates (one row per customer)

You can now sort by customer_tier or filter to just the "At Risk" segment to hand off to the retention team. Or use Select Columns to trim down to just customer name, email, tier, and total spend before exporting.

The thresholds ($5,000, 90 days, 180 days) are just starting points. You can go back to the Add Column card, tweak the numbers, and the whole pipeline recalculates. No need to rebuild anything.

customer_nameemailcustomer_tierorder_total_sumorder_total_countdays_since_last_order
Sandra Reyessreyes@globex.comVIP5,800.00163
Maria Chenm.chen@acme.comRegular2,660.50375
Priya Nairp.nair@starco.netAt Risk1,120.002142
James Okaforj.okafor@vertex.ioChurned75.001495

40,000 transactions collapsed to one row per customer - each with their tier, lifetime spend, order count, and recency.

Segment your customers now →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData