← All posts
5 min read

Tracking Customer Cohorts by First Purchase Month

Of the customers who first bought from you in January, how many came back in February? How about March? This is cohort retention analysis, and it answers the question every subscription or repeat-purchase business eventually asks: are we keeping the customers we acquire?

This kind of analysis usually lives inside a BI tool or a gnarly SQL query with multiple CTEs. But you can build it visually in ExploreMyData using a five-step pipeline. Here is how.

What you need

An orders table (CSV, Parquet, or Excel) with at least three columns: customer_id, order_date, and order_id. Each row is one order. A customer can appear many times if they placed multiple orders.

If you want to follow along, a sample orders file is available in the blog data folder. Drop it into ExploreMyData and you are ready to go.

Step 1: Find each customer's first purchase date

Open the toolbar and select Window from the Aggregate group. Configure it like this:

  1. Function: MIN
  2. Column: order_date
  3. Partition by: customer_id
  4. New column name: first_purchase_date

This adds a new column to every row with the earliest order date for that customer. The SQL behind it is a window function: MIN(order_date) OVER (PARTITION BY customer_id). If a customer's first order was January 12, every one of their rows now shows January 12 in the first_purchase_date column.

customer_idorder_idorder_datefirst_purchase_date
C-1001ORD-55012025-01-122025-01-12
C-1001ORD-68402025-03-042025-01-12
C-1001ORD-72102025-04-292025-01-12
C-2045ORD-58902025-02-082025-02-08

MIN(order_date) OVER (PARTITION BY customer_id) stamps every row with the customer's first-ever purchase date - without collapsing the rows.

Step 2: Extract the cohort month

Now select Extract Date from the Transform group.

  1. Source column: first_purchase_date
  2. Part to extract: month
  3. New column name: cohort_month

This gives you a number (1 through 12) representing which month each customer first showed up. January buyers are cohort 1, February buyers are cohort 2, and so on.

Step 3: Extract the order month

Apply Extract Date again, this time on the original order_date column. Name the result order_month.

Now every row has two month numbers: the month the customer first purchased (their cohort) and the month of this particular order.

Step 4: Calculate months since first purchase

Select Add Column from the Columns group. Use a math expression:

order_month - cohort_month

Name this column months_since_first. A value of 0 means the order happened in the same month as their first purchase. A value of 1 means one month later, 2 means two months later, and so on.

If your data spans multiple years, you will want to extract the year too and factor it into the calculation. For data within a single calendar year, the simple subtraction works fine.

Step 5: Pivot into a cohort table

This is where it comes together. Select Pivot from the Aggregate group:

  1. Rows: cohort_month
  2. Columns: months_since_first
  3. Values: customer_id
  4. Aggregation: COUNT DISTINCT

The result is your cohort retention table. Each row is a cohort (customers who first bought in a given month). Each column is a period offset (0 months later, 1 month later, etc.). The cells contain the count of distinct customers from that cohort who placed an order in that period.

cohort_monthMonth 0Month 1Month 2Month 3Month 4
1 (Jan)20048312419
2 (Feb)18552382721
3 (Mar)241614430 -
4 (Apr)2185740 - -

Each row is a cohort. Column 0 is its initial size. The February cohort has 28% one-month retention (52 / 185), slightly better than January's 24% (48 / 200).

Reading the results

The column labeled 0 is each cohort's initial size -- how many unique customers first purchased that month. Column 1 is how many of those same customers came back one month later.

If cohort 1 (January) had 200 customers at month 0 and 48 at month 1, your one-month retention rate for January is 24%. You can scan down the columns to compare retention across cohorts, or scan across a row to see how a single cohort decays over time.

Want percentages instead of raw counts? Add another pipeline step after the pivot: use Math to divide each month column by the month-0 column and multiply by 100.

The full pipeline

Five steps, no code. Your pipeline panel should show:

  1. Window - MIN order_date partitioned by customer_id
  2. Extract Date - month from first_purchase_date
  3. Extract Date - month from order_date
  4. Add Column - order_month - cohort_month
  5. Pivot - count distinct customer_id by cohort_month and months_since_first

Every step is visible, editable, and generates the SQL you would have had to write by hand. If your data updates next month, just reload the file and the pipeline runs again.

Build your cohort analysis →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData