Tracking Customer Cohorts by First Purchase Month
Of the customers who first bought from you in January, how many came back in February? How about March? This is cohort retention analysis, and it answers the question every subscription or repeat-purchase business eventually asks: are we keeping the customers we acquire?
This kind of analysis usually lives inside a BI tool or a gnarly SQL query with multiple CTEs. But you can build it visually in ExploreMyData using a five-step pipeline. Here is how.
What you need
An orders table (CSV, Parquet, or Excel) with at least three columns: customer_id, order_date, and order_id. Each row is one order. A customer can appear many times if they placed multiple orders.
If you want to follow along, a sample orders file is available in the blog data folder. Drop it into ExploreMyData and you are ready to go.
Step 1: Find each customer's first purchase date
Open the toolbar and select Window from the Aggregate group. Configure it like this:
- Function:
MIN - Column:
order_date - Partition by:
customer_id - New column name:
first_purchase_date
This adds a new column to every row with the earliest order date for that customer. The SQL behind it is a window function: MIN(order_date) OVER (PARTITION BY customer_id). If a customer's first order was January 12, every one of their rows now shows January 12 in the first_purchase_date column.
| customer_id | order_id | order_date | first_purchase_date |
|---|---|---|---|
| C-1001 | ORD-5501 | 2025-01-12 | 2025-01-12 |
| C-1001 | ORD-6840 | 2025-03-04 | 2025-01-12 |
| C-1001 | ORD-7210 | 2025-04-29 | 2025-01-12 |
| C-2045 | ORD-5890 | 2025-02-08 | 2025-02-08 |
MIN(order_date) OVER (PARTITION BY customer_id) stamps every row with the customer's first-ever purchase date - without collapsing the rows.
Step 2: Extract the cohort month
Now select Extract Date from the Transform group.
- Source column:
first_purchase_date - Part to extract:
month - New column name:
cohort_month
This gives you a number (1 through 12) representing which month each customer first showed up. January buyers are cohort 1, February buyers are cohort 2, and so on.
Step 3: Extract the order month
Apply Extract Date again, this time on the original order_date column. Name the result order_month.
Now every row has two month numbers: the month the customer first purchased (their cohort) and the month of this particular order.
Step 4: Calculate months since first purchase
Select Add Column from the Columns group. Use a math expression:
order_month - cohort_month
Name this column months_since_first. A value of 0 means the order happened in the same month as their first purchase. A value of 1 means one month later, 2 means two months later, and so on.
If your data spans multiple years, you will want to extract the year too and factor it into the calculation. For data within a single calendar year, the simple subtraction works fine.
Step 5: Pivot into a cohort table
This is where it comes together. Select Pivot from the Aggregate group:
- Rows:
cohort_month - Columns:
months_since_first - Values:
customer_id - Aggregation:
COUNT DISTINCT
The result is your cohort retention table. Each row is a cohort (customers who first bought in a given month). Each column is a period offset (0 months later, 1 month later, etc.). The cells contain the count of distinct customers from that cohort who placed an order in that period.
| cohort_month | Month 0 | Month 1 | Month 2 | Month 3 | Month 4 |
|---|---|---|---|---|---|
| 1 (Jan) | 200 | 48 | 31 | 24 | 19 |
| 2 (Feb) | 185 | 52 | 38 | 27 | 21 |
| 3 (Mar) | 241 | 61 | 44 | 30 | - |
| 4 (Apr) | 218 | 57 | 40 | - | - |
Each row is a cohort. Column 0 is its initial size. The February cohort has 28% one-month retention (52 / 185), slightly better than January's 24% (48 / 200).
Reading the results
The column labeled 0 is each cohort's initial size -- how many unique customers first purchased that month. Column 1 is how many of those same customers came back one month later.
If cohort 1 (January) had 200 customers at month 0 and 48 at month 1, your one-month retention rate for January is 24%. You can scan down the columns to compare retention across cohorts, or scan across a row to see how a single cohort decays over time.
Want percentages instead of raw counts? Add another pipeline step after the pivot: use Math to divide each month column by the month-0 column and multiply by 100.
The full pipeline
Five steps, no code. Your pipeline panel should show:
- Window - MIN order_date partitioned by customer_id
- Extract Date - month from first_purchase_date
- Extract Date - month from order_date
- Add Column - order_month - cohort_month
- Pivot - count distinct customer_id by cohort_month and months_since_first
Every step is visible, editable, and generates the SQL you would have had to write by hand. If your data updates next month, just reload the file and the pipeline runs again.