← All posts
5 min read

Grouping and Aggregating Data: A Practical Guide

"What's the total revenue by region?" "How many orders per product category?" "What's the average ticket price by venue?" These questions share the same structure: take a column, group rows by its values, and compute something for each group.

The Group & Aggregate operation handles this directly. Select your grouping columns, pick your aggregate functions, and apply. No SQL needed, though the generated SQL is visible in the pipeline card if you want to verify it.

Basic example: total revenue by region

Suppose you have an orders table with columns: order_id, region, category, revenue.

  1. Open the Group & Aggregate operation from the toolbar.
  2. Set Group by columns to "region".
  3. Add an aggregate: column "revenue", function SUM.
  4. Apply.

The result collapses your detail rows into one row per region:

regionrevenue_sum
North142,500
South87,300
West63,200
East51,800

Group & Aggregate configuration

  • Group by: region
  • Aggregate 1: revenue, SUM

Generated SQL: SELECT region, SUM(revenue) AS revenue_sum FROM data GROUP BY region

Multiple grouping columns

You can group by more than one column. Setting group columns to "region" and "category" produces one row for each unique region-category pair.

regioncategoryrevenue_sumorder_count
NorthElectronics82,10034
NorthClothing60,40028
SouthElectronics45,20019

Multiple aggregates in one step

Each Group & Aggregate step can include several aggregations. For example, you might want SUM of revenue, COUNT of orders, and AVG order value all at once:

  • Aggregate 1: revenue, SUM (alias: total_revenue)
  • Aggregate 2: order_id, COUNT (alias: order_count)
  • Aggregate 3: revenue, AVG (alias: avg_order_value)

All three appear as columns in the output. This avoids chaining multiple pipeline steps for what is logically a single summary.

Useful aggregate functions

Beyond SUM and COUNT, there are several functions worth knowing:

  • COUNT_DISTINCT counts unique values. Use it for questions like "how many distinct customers per region?"
  • MEDIAN returns the middle value, which is more robust than AVG when your data has outliers.
  • STRING_AGG concatenates text values with a comma separator. Handy for collapsing a list of product names or tags into a single cell.
  • MIN / MAX return the smallest and largest values. Useful for date ranges: MIN(order_date) and MAX(order_date) give you the first and last order per group.

Whole-table aggregates

If you leave the group-by columns empty, the aggregation runs across the entire table and returns a single row. This is useful for computing totals or overall statistics:

total_revenuetotal_ordersavg_revenue
344,8001,247276.42

Group & Aggregate vs. Pivot vs. Window

ExploreMyData has three operations that do aggregation. Here is when to use each:

  • Group & Aggregate: You want a summary table with one row per group and one or more aggregate columns. This is the most direct tool for "total X by Y" questions.
  • Pivot: You want the grouped values spread across columns instead of rows. Pivot is better when the grouping column has a small number of distinct values and you want them as column headers.
  • Window Function: You want the aggregate value attached to every detail row instead of collapsing the data. Use this for percentage-of-total calculations or comparing each row to its group average.

For a deeper comparison of all three approaches, see the Group By Without SQL post. For full documentation, see the Group & Aggregate reference.

Start aggregating your data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData