December 13, 2025 by Arif Aslam 4 min read

Filling Missing Category Labels Using Forward Fill

Someone sends you an Excel export of a sales report. You open it as a CSV and the category column looks like this: "Electronics" on the first row, then fifteen blank rows, then "Clothing" on row seventeen, then twelve more blanks. The original report looked fine in Excel because the merged cells made it obvious which items belonged to which category. In a flat CSV, those merged cells become empty strings.

This pattern shows up constantly in data exported from Excel, PDF-to-CSV conversions, and reporting tools that display group headers only once. The category information is technically there - it's just missing from most rows. You need every row to carry its own category label.

What forward fill does

Forward fill takes the last non-empty value and copies it down into the empty rows below. When it hits another non-empty value, it starts using that one instead. It walks through the data top to bottom, carrying each value forward until it finds a replacement.

Before:

Electronics
NULL
NULL
Clothing
NULL
NULL
NULL

After:

Electronics
Electronics
Electronics
Clothing
Clothing
Clothing
Clothing

Raw export from Excel - category label only on the first row of each group, rest are NULL:

category	product_name	price	units_sold
Electronics	Wireless Headphones	$79.99	142
NULL	USB-C Hub	$34.99	89
NULL	Laptop Stand	$45.00	204
Clothing	Merino Wool Tee	$55.00	317
NULL	Canvas Tote Bag	$22.00	510
NULL	Fleece Jacket	$89.00	98

Grouping by category is impossible - most rows have NULL, so they'd all fall into one "unknown" bucket.

Applying forward fill in ExploreMyData

Open Fill Missing from the Data group. Select the category column and choose "forward fill" as the method.

One click. That's the whole operation.

Behind the scenes, ExploreMyData generates a window function:

COALESCE(category, LAG(category) IGNORE NULLS OVER(ORDER BY rowid))

Here's what this does: for each row, if the category is not NULL, use it. If it is NULL, look backward through previous rows (using LAG with IGNORE NULLS) and grab the most recent non-NULL value. The ORDER BY rowid preserves the original row order, which is critical - forward fill only makes sense when rows are in the right sequence.

After applying forward fill - every row now carries its category label:

category (filled)	product_name	price	units_sold
Electronics	Wireless Headphones	$79.99	142
Electronics	USB-C Hub	$34.99	89
Electronics	Laptop Stand	$45.00	204
Clothing	Merino Wool Tee	$55.00	317
Clothing	Canvas Tote Bag	$22.00	510
Clothing	Fleece Jacket	$89.00	98

Highlighted rows had NULL - forward fill propagated the previous category value down into each gap.

Handling NULLs at the top

Forward fill has one blind spot: rows before the first non-empty value. If the very first row of the category column is NULL (maybe the report had a header section), forward fill can't look backward because there's nothing there yet. Those rows stay NULL.

Fix this with Update Values. Set it to conditional mode: where the category column is NULL, set it to "Uncategorized" (or whatever default makes sense for your data). This catches any stragglers that forward fill couldn't reach.

You could also do this first, before the forward fill, if you know the top rows are header junk. Or skip it entirely if your data always starts with a real category.

When row order matters

Forward fill depends entirely on the order of rows. If you sort the data before applying forward fill, you'll get wrong results. The fill is based on the physical position of rows, not any logical grouping.

If your data has already been sorted or shuffled, forward fill won't work correctly. You need the rows in their original order - the order they had in the source report where the category labels were placed at group headers.

In ExploreMyData, pipeline steps run in sequence. If you've added a Sort step before Fill Missing, consider whether that sort is disrupting the natural grouping. You may need to fill first, then sort.

Multiple columns with the same problem

Some reports have this pattern on more than one column. Maybe both "category" and "department" are sparse. Apply forward fill to each column separately. Each one is its own pipeline step. The order doesn't matter here - they're independent fills on independent columns.

After filling: group and analyze

Once every row carries its category label, your data is ready for real analysis. Group by category to get totals. Pivot to compare categories side by side. Filter to a single category. All of this was impossible when 90% of the category column was blank.

The pipeline is two steps: forward fill the category, then do whatever analysis you need. Load new data from the same source next month and it runs the same fill automatically.

Group By on the filled category column - analysis that was impossible before is now straightforward:

category	row_count	total_units_sold	avg_price
Clothing	3	925	$55.33
Electronics	3	435	$53.33
Home & Garden	4	612	$38.74

Before forward fill, this query would have returned one row: category = NULL with all products lumped together.

Fill in your missing categories →

Filling Missing Category Labels Using Forward Fill

What forward fill does

Applying forward fill in ExploreMyData

Handling NULLs at the top

When row order matters

Multiple columns with the same problem

After filling: group and analyze

Try it yourself

Related tools

More from the blog

5 Ways to Clean Messy Data Without Writing Code

Converting "$1,234.56" Strings to Actual Numbers

Splitting a Full Address into Street, City, State, Zip