Building a Monthly Sales Report from Raw Transactions
It's the first Monday of the month. Your manager wants a revenue breakdown by product category and region. You have an orders CSV with 40,000 rows of individual transactions. Each row is a single line item: order ID, date, product, category, region, quantity, unit price.
What your manager actually wants is a table with maybe 30 rows. Month down the side, categories across the top, revenue numbers in the cells. Getting from 40,000 rows to 30 requires filtering bad data, extracting dates, aggregating, and reshaping. Here's how to build that pipeline in ExploreMyData.
The raw data
Imagine a typical orders export. Columns include order_id, order_date, status, customer_id, product_name, category, region, quantity, and unit_price. Mixed in with the real orders are test orders (status = "test") and cancelled ones (status = "cancelled"). Those need to go before you do anything else.
| order_id | order_date | status | category | region | quantity | unit_price |
|---|---|---|---|---|---|---|
| ORD-10041 | 2026-01-03 08:14:22 | completed | Electronics | West | 2 | 149.99 |
| ORD-10042 | 2026-01-03 09:02:05 | test | Apparel | East | 1 | 99999.00 |
| ORD-10043 | 2026-01-04 11:37:48 | completed | Home & Garden | South | 3 | 34.50 |
| ORD-10044 | 2026-01-04 14:19:33 | cancelled | Electronics | West | 1 | 299.00 |
| ORD-10045 | 2026-01-05 10:55:12 | completed | Apparel | Midwest | 4 | 49.99 |
40,000 rows total - test and cancelled orders must be filtered out before aggregating.
Step 1: Filter out test and cancelled orders
Open the operations toolbar and select Filter. Set the column to status, the operator to "not in", and the values to "test" and "cancelled". Click Apply.
This adds the first card to your pipeline sidebar. You'll see it generates something like WHERE status NOT IN ('test', 'cancelled'). The row count drops immediately - in a typical dataset, you might lose 5-10% of rows here.
This step matters more than people think. One "test" order with a $99,999 amount will wreck your monthly averages if you leave it in.
Step 2: Extract month and year from the order date
Your report needs data grouped by month. The raw order_date column has full timestamps like "2026-02-14 09:23:17". You need just the month.
Select Extract Date from the Transform group. Choose order_date as the source column, and extract "month" and "year" as separate columns. This creates two new columns: order_date_month and order_date_year.
A second pipeline card appears in the sidebar. Under the hood, ExploreMyData uses DuckDB's EXTRACT(MONTH FROM order_date) and EXTRACT(YEAR FROM order_date) functions.
Step 3: Pivot revenue by category and month
This is where the transformation happens. Select Pivot from the Transform group. Configure it like this:
- Row grouping:
order_date_yearandorder_date_month - Column to pivot:
category - Value column:
unit_price(or a calculated revenue column if you have one) - Aggregation: SUM
Click Apply and your 40,000 rows collapse into a summary table. Each row is a year-month combination. Each category becomes its own column with the total revenue for that month. The pipeline sidebar now shows three cards stacking up.
| order_year | order_month | Electronics | Apparel | Home & Garden | Sports |
|---|---|---|---|---|---|
| 2026 | 1 | 48,240.50 | 21,890.00 | 9,340.75 | 14,120.00 |
| 2026 | 2 | 52,810.00 | 19,450.50 | 11,205.25 | 12,880.00 |
| 2026 | 3 | 61,490.75 | 24,310.00 | 13,670.50 | 18,200.25 |
40,000 raw rows collapsed to 36 summary rows - one per year-month combination, one column per category.
Step 4: Add a percentage-of-total column
Your manager will inevitably ask "what percentage is Electronics?" So let's add that now. Select Add Column from the Columns group.
First, you might want to add a total_revenue column that sums across all category columns for each row. Then add a percentage column for any specific category. For instance, name it electronics_pct with the expression ROUND(Electronics / total_revenue * 100, 1).
Each Add Column operation becomes its own pipeline card. You can see the exact SQL expression in the card, and you can edit or delete any of them without affecting the others.
Step 5: Select and reorder columns for presentation
The final table probably has some columns you don't need in the report - the raw month and year numbers, maybe intermediate calculation columns. Select Select Columns from the Columns group. Check only the columns you want in the final output and drag them into the order your manager expects.
This is also a good moment to use Sort to make sure the months are in chronological order. Sort by order_date_year ascending, then order_date_month ascending.
The finished pipeline
Look at the pipeline sidebar. You have five cards, each representing a step:
- Filter out bad orders
- Extract date parts
- Pivot to summary
- Add calculated columns
- Select and reorder
The whole thing runs in your browser. No data was uploaded anywhere. And next month, when you get a fresh export, you can load the new file and the pipeline rebuilds the same report instantly.
If the categories change - maybe "Electronics" gets split into "Consumer Electronics" and "Enterprise Hardware" - you just update the pivot step. The rest of the pipeline stays the same.
| Month | Electronics | Apparel | Home & Garden | Sports | Total | electronics_pct |
|---|---|---|---|---|---|---|
| Jan 2026 | 48,240.50 | 21,890.00 | 9,340.75 | 14,120.00 | 93,591.25 | 51.5% |
| Feb 2026 | 52,810.00 | 19,450.50 | 11,205.25 | 12,880.00 | 96,345.75 | 54.8% |
| Mar 2026 | 61,490.75 | 24,310.00 | 13,670.50 | 18,200.25 | 117,671.50 | 52.3% |
Final output: 5 pipeline steps, report-ready table with totals and category share percentages.