Filling Missing Category Labels Using Forward Fill
Someone sends you an Excel export of a sales report. You open it as a CSV and the category column looks like this: "Electronics" on the first row, then fifteen blank rows, then "Clothing" on row seventeen, then twelve more blanks. The original report looked fine in Excel because the merged cells made it obvious which items belonged to which category. In a flat CSV, those merged cells become empty strings.
This pattern shows up constantly in data exported from Excel, PDF-to-CSV conversions, and reporting tools that display group headers only once. The category information is technically there - it's just missing from most rows. You need every row to carry its own category label.
What forward fill does
Forward fill takes the last non-empty value and copies it down into the empty rows below. When it hits another non-empty value, it starts using that one instead. It walks through the data top to bottom, carrying each value forward until it finds a replacement.
Before:
ElectronicsNULLNULLClothingNULLNULLNULL
After:
ElectronicsElectronicsElectronicsClothingClothingClothingClothing
Raw export from Excel - category label only on the first row of each group, rest are NULL:
| category | product_name | price | units_sold |
|---|---|---|---|
| Electronics | Wireless Headphones | $79.99 | 142 |
| NULL | USB-C Hub | $34.99 | 89 |
| NULL | Laptop Stand | $45.00 | 204 |
| Clothing | Merino Wool Tee | $55.00 | 317 |
| NULL | Canvas Tote Bag | $22.00 | 510 |
| NULL | Fleece Jacket | $89.00 | 98 |
Grouping by category is impossible - most rows have NULL, so they'd all fall into one "unknown" bucket.
Applying forward fill in ExploreMyData
Open Fill Missing from the Data group. Select the category column and choose "forward fill" as the method.
One click. That's the whole operation.
Behind the scenes, ExploreMyData generates a window function:
COALESCE(category, LAG(category) IGNORE NULLS OVER(ORDER BY rowid))
Here's what this does: for each row, if the category is not NULL, use it. If it is NULL,
look backward through previous rows (using LAG with
IGNORE NULLS) and grab the most recent
non-NULL value. The ORDER BY rowid preserves
the original row order, which is critical - forward fill only makes sense when rows are
in the right sequence.
After applying forward fill - every row now carries its category label:
| category (filled) | product_name | price | units_sold |
|---|---|---|---|
| Electronics | Wireless Headphones | $79.99 | 142 |
| Electronics | USB-C Hub | $34.99 | 89 |
| Electronics | Laptop Stand | $45.00 | 204 |
| Clothing | Merino Wool Tee | $55.00 | 317 |
| Clothing | Canvas Tote Bag | $22.00 | 510 |
| Clothing | Fleece Jacket | $89.00 | 98 |
Highlighted rows had NULL - forward fill propagated the previous category value down into each gap.
Handling NULLs at the top
Forward fill has one blind spot: rows before the first non-empty value. If the very first row of the category column is NULL (maybe the report had a header section), forward fill can't look backward because there's nothing there yet. Those rows stay NULL.
Fix this with Update Values. Set it to conditional mode: where the category column is NULL, set it to "Uncategorized" (or whatever default makes sense for your data). This catches any stragglers that forward fill couldn't reach.
You could also do this first, before the forward fill, if you know the top rows are header junk. Or skip it entirely if your data always starts with a real category.
When row order matters
Forward fill depends entirely on the order of rows. If you sort the data before applying forward fill, you'll get wrong results. The fill is based on the physical position of rows, not any logical grouping.
If your data has already been sorted or shuffled, forward fill won't work correctly. You need the rows in their original order - the order they had in the source report where the category labels were placed at group headers.
In ExploreMyData, pipeline steps run in sequence. If you've added a Sort step before Fill Missing, consider whether that sort is disrupting the natural grouping. You may need to fill first, then sort.
Multiple columns with the same problem
Some reports have this pattern on more than one column. Maybe both "category" and "department" are sparse. Apply forward fill to each column separately. Each one is its own pipeline step. The order doesn't matter here - they're independent fills on independent columns.
After filling: group and analyze
Once every row carries its category label, your data is ready for real analysis. Group by category to get totals. Pivot to compare categories side by side. Filter to a single category. All of this was impossible when 90% of the category column was blank.
The pipeline is two steps: forward fill the category, then do whatever analysis you need. Load new data from the same source next month and it runs the same fill automatically.
Group By on the filled category column - analysis that was impossible before is now straightforward:
| category | row_count | total_units_sold | avg_price |
|---|---|---|---|
| Clothing | 3 | 925 | $55.33 |
| Electronics | 3 | 435 | $53.33 |
| Home & Garden | 4 | 612 | $38.74 |
Before forward fill, this query would have returned one row: category = NULL with all products lumped together.