← All posts
4 min read

Filling Missing Category Labels Using Forward Fill

Someone sends you an Excel export of a sales report. You open it as a CSV and the category column looks like this: "Electronics" on the first row, then fifteen blank rows, then "Clothing" on row seventeen, then twelve more blanks. The original report looked fine in Excel because the merged cells made it obvious which items belonged to which category. In a flat CSV, those merged cells become empty strings.

This pattern shows up constantly in data exported from Excel, PDF-to-CSV conversions, and reporting tools that display group headers only once. The category information is technically there - it's just missing from most rows. You need every row to carry its own category label.

What forward fill does

Forward fill takes the last non-empty value and copies it down into the empty rows below. When it hits another non-empty value, it starts using that one instead. It walks through the data top to bottom, carrying each value forward until it finds a replacement.

Before:

  • Electronics
  • NULL
  • NULL
  • Clothing
  • NULL
  • NULL
  • NULL

After:

  • Electronics
  • Electronics
  • Electronics
  • Clothing
  • Clothing
  • Clothing
  • Clothing

Raw export from Excel - category label only on the first row of each group, rest are NULL:

category product_name price units_sold
ElectronicsWireless Headphones$79.99142
NULLUSB-C Hub$34.9989
NULLLaptop Stand$45.00204
ClothingMerino Wool Tee$55.00317
NULLCanvas Tote Bag$22.00510
NULLFleece Jacket$89.0098

Grouping by category is impossible - most rows have NULL, so they'd all fall into one "unknown" bucket.

Applying forward fill in ExploreMyData

Open Fill Missing from the Data group. Select the category column and choose "forward fill" as the method.

One click. That's the whole operation.

Behind the scenes, ExploreMyData generates a window function:

COALESCE(category, LAG(category) IGNORE NULLS OVER(ORDER BY rowid))

Here's what this does: for each row, if the category is not NULL, use it. If it is NULL, look backward through previous rows (using LAG with IGNORE NULLS) and grab the most recent non-NULL value. The ORDER BY rowid preserves the original row order, which is critical - forward fill only makes sense when rows are in the right sequence.

After applying forward fill - every row now carries its category label:

category (filled) product_name price units_sold
ElectronicsWireless Headphones$79.99142
ElectronicsUSB-C Hub$34.9989
ElectronicsLaptop Stand$45.00204
ClothingMerino Wool Tee$55.00317
ClothingCanvas Tote Bag$22.00510
ClothingFleece Jacket$89.0098

Highlighted rows had NULL - forward fill propagated the previous category value down into each gap.

Handling NULLs at the top

Forward fill has one blind spot: rows before the first non-empty value. If the very first row of the category column is NULL (maybe the report had a header section), forward fill can't look backward because there's nothing there yet. Those rows stay NULL.

Fix this with Update Values. Set it to conditional mode: where the category column is NULL, set it to "Uncategorized" (or whatever default makes sense for your data). This catches any stragglers that forward fill couldn't reach.

You could also do this first, before the forward fill, if you know the top rows are header junk. Or skip it entirely if your data always starts with a real category.

When row order matters

Forward fill depends entirely on the order of rows. If you sort the data before applying forward fill, you'll get wrong results. The fill is based on the physical position of rows, not any logical grouping.

If your data has already been sorted or shuffled, forward fill won't work correctly. You need the rows in their original order - the order they had in the source report where the category labels were placed at group headers.

In ExploreMyData, pipeline steps run in sequence. If you've added a Sort step before Fill Missing, consider whether that sort is disrupting the natural grouping. You may need to fill first, then sort.

Multiple columns with the same problem

Some reports have this pattern on more than one column. Maybe both "category" and "department" are sparse. Apply forward fill to each column separately. Each one is its own pipeline step. The order doesn't matter here - they're independent fills on independent columns.

After filling: group and analyze

Once every row carries its category label, your data is ready for real analysis. Group by category to get totals. Pivot to compare categories side by side. Filter to a single category. All of this was impossible when 90% of the category column was blank.

The pipeline is two steps: forward fill the category, then do whatever analysis you need. Load new data from the same source next month and it runs the same fill automatically.

Group By on the filled category column - analysis that was impossible before is now straightforward:

category row_count total_units_sold avg_price
Clothing3925$55.33
Electronics3435$53.33
Home & Garden4612$38.74

Before forward fill, this query would have returned one row: category = NULL with all products lumped together.

Fill in your missing categories →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData