Group By and Aggregate Without Writing SQL
Total revenue by category. Average order value by region. Count of support tickets by priority level.
These are the questions you ask every day, and they all have the same shape: take a column, group
the rows by its values, and calculate something for each group. In SQL, that's
GROUP BY. In a spreadsheet,
it's a pivot table.
In ExploreMyData, you have three ways to do this - each suited to different situations. This post walks through all three so you can pick the right one.
Approach 1: Pivot operation (the visual GROUP BY)
The Pivot operation is the most direct way to aggregate. It works like a spreadsheet pivot table, but it's generating SQL under the hood.
Example: total revenue by product category
You have a table with columns: order_id, category, revenue, region. You want total revenue per category.
- Open the Pivot operation from the toolbar.
- Set value column to "revenue".
- Set pivot column (the grouping column) to "category".
- Set aggregate function to SUM.
- Apply.
The result collapses your detail rows into one row per category with the summed revenue:
| category | sum_revenue |
|---|---|
| Electronics | 142,500 |
| Clothing | 87,300 |
| Home & Garden | 63,200 |
Under the hood, this is SELECT category, SUM(revenue) FROM ... GROUP BY category.
You can see the exact SQL in the pipeline card. For more on Pivot specifically, see the
pivot tables post.
Pivot operation configuration
- Value column: revenue
- Pivot column (group by): category
- Aggregate function: SUM
Generated SQL: SELECT category, SUM(revenue) AS sum_revenue FROM data GROUP BY category ORDER BY sum_revenue DESC
The tradeoff: Pivot collapses your rows. You go from 10,000 order rows to 3 category rows. That's often what you want for a summary. But if you need the aggregate value alongside each detail row - say, each order's revenue plus the category total - you need a different approach.
Approach 2: Window functions (aggregate without collapsing)
Window functions compute an aggregate but attach it to every row instead of collapsing groups.
This is SUM(revenue) OVER (PARTITION BY category)
in SQL - and it's available as a visual operation.
Example: each order with its category total
- Open the Window Function operation.
- Set function type to "running_total" (which uses SUM).
- Set the column to "revenue".
- Set partition by to "category".
- Name the output column "category_total".
Now every row has its original data plus a new column showing the category total:
| order_id | category | revenue | category_total |
|---|---|---|---|
| 1001 | Electronics | 250 | 142,500 |
| 1002 | Clothing | 75 | 87,300 |
| 1003 | Electronics | 430 | 142,500 |
This is powerful because you can then add a
calculated column
for percentage of category: revenue / category_total * 100.
You keep the detail rows and add context. The
window functions guide
covers all six function types in depth.
Approach 3: SQL operation (for complex cases)
Sometimes you need GROUP BY with conditions - like "total revenue by category, but only for categories with more than 10 orders." That's a HAVING clause, and the visual operations don't directly support it. This is where the SQL operation comes in.
- Open the SQL operation from the toolbar.
- Write your query against the current table (referenced as the table name shown in the pipeline).
- Apply.
For the HAVING example:
SELECT category, SUM(revenue) as total, COUNT(*) as orders FROM {table} GROUP BY category HAVING COUNT(*) > 10
Other cases where SQL is the better choice:
- Multiple aggregates in one query (SUM, AVG, COUNT, MIN, MAX together)
- GROUP BY on an expression (like grouping by month extracted from a date)
- Combining GROUP BY with JOINs across tables
You can also use the AI SQL feature - type something like "total revenue and average order value by category, only categories with more than 10 orders" and the app generates the SQL for you. See the SQL guide for more on writing queries directly.
| category | total_revenue | orders |
|---|---|---|
| Electronics | 142,500 | 47 |
| Clothing | 87,300 | 31 |
| Home & Garden | 63,200 | 22 |
| Sports | 41,800 | 18 |
Result of GROUP BY category HAVING COUNT(*) > 10 - only categories with more than 10 orders appear.
When to use which approach
Here's the decision framework:
- Pivot - you want a summary table. One row per group, one aggregate value. This is your default choice for "total X by Y" questions. Quick, visual, no SQL needed.
- Window function - you want the aggregate alongside every detail row. Use this for percentage-of-total calculations, comparisons to group averages, or any time you need the original rows preserved.
- SQL operation - you need HAVING clauses, multiple aggregates, grouped expressions, or anything else the visual operations don't cover. It's the escape hatch, and it's always there.
These three approaches aren't mutually exclusive. You can pivot first to get a summary, then use the result as input to another pipeline step. Or add a window function for context, then filter on it. They're all just pipeline steps that chain together.
A practical example: sales report
Say you're building a report from raw order data. You might combine all three approaches:
- Filter - remove cancelled orders
- Window function - add category_total to each row
- Calculated column - compute pct_of_category
- Pivot - summarize by region for the final output
Each step builds on the last. The pipeline sidebar shows every step, the SQL for each one, and you can edit or reorder any of them. That's the power of composing simple operations into complex analysis. The multi-step pipeline post walks through a full example.