Grouping and Aggregating Data: A Practical Guide
"What's the total revenue by region?" "How many orders per product category?" "What's the average ticket price by venue?" These questions share the same structure: take a column, group rows by its values, and compute something for each group.
The Group & Aggregate operation handles this directly. Select your grouping columns, pick your aggregate functions, and apply. No SQL needed, though the generated SQL is visible in the pipeline card if you want to verify it.
Basic example: total revenue by region
Suppose you have an orders table with columns: order_id,
region,
category,
revenue.
- Open the Group & Aggregate operation from the toolbar.
- Set Group by columns to "region".
- Add an aggregate: column "revenue", function SUM.
- Apply.
The result collapses your detail rows into one row per region:
| region | revenue_sum |
|---|---|
| North | 142,500 |
| South | 87,300 |
| West | 63,200 |
| East | 51,800 |
Group & Aggregate configuration
- Group by: region
- Aggregate 1: revenue, SUM
Generated SQL: SELECT region, SUM(revenue) AS revenue_sum FROM data GROUP BY region
Multiple grouping columns
You can group by more than one column. Setting group columns to "region" and "category" produces one row for each unique region-category pair.
| region | category | revenue_sum | order_count |
|---|---|---|---|
| North | Electronics | 82,100 | 34 |
| North | Clothing | 60,400 | 28 |
| South | Electronics | 45,200 | 19 |
Multiple aggregates in one step
Each Group & Aggregate step can include several aggregations. For example, you might want SUM of revenue, COUNT of orders, and AVG order value all at once:
- Aggregate 1: revenue, SUM (alias: total_revenue)
- Aggregate 2: order_id, COUNT (alias: order_count)
- Aggregate 3: revenue, AVG (alias: avg_order_value)
All three appear as columns in the output. This avoids chaining multiple pipeline steps for what is logically a single summary.
Useful aggregate functions
Beyond SUM and COUNT, there are several functions worth knowing:
- COUNT_DISTINCT counts unique values. Use it for questions like "how many distinct customers per region?"
- MEDIAN returns the middle value, which is more robust than AVG when your data has outliers.
- STRING_AGG concatenates text values with a comma separator. Handy for collapsing a list of product names or tags into a single cell.
- MIN / MAX return the smallest and largest values. Useful for date ranges: MIN(order_date) and MAX(order_date) give you the first and last order per group.
Whole-table aggregates
If you leave the group-by columns empty, the aggregation runs across the entire table and returns a single row. This is useful for computing totals or overall statistics:
| total_revenue | total_orders | avg_revenue |
|---|---|---|
| 344,800 | 1,247 | 276.42 |
Group & Aggregate vs. Pivot vs. Window
ExploreMyData has three operations that do aggregation. Here is when to use each:
- Group & Aggregate: You want a summary table with one row per group and one or more aggregate columns. This is the most direct tool for "total X by Y" questions.
- Pivot: You want the grouped values spread across columns instead of rows. Pivot is better when the grouping column has a small number of distinct values and you want them as column headers.
- Window Function: You want the aggregate value attached to every detail row instead of collapsing the data. Use this for percentage-of-total calculations or comparing each row to its group average.
For a deeper comparison of all three approaches, see the Group By Without SQL post. For full documentation, see the Group & Aggregate reference.