November 23, 2025 by Arif Aslam 5 min read

Finding Patterns in Missing Data

You open a dataset and notice gaps. Some cells are blank, some columns are mostly empty, and the row count doesn't match what you expected. The question isn't just "is data missing?" - it's "why is it missing, and does the pattern tell me something?"

Random NULLs scattered across a dataset are annoying but manageable. Systematic gaps - where every row from a particular source or time period is missing the same fields - are a signal. Maybe a form field was added in March, so everything before that is blank. Maybe one regional office never fills in the "phone" column. Maybe an API started returning a new field halfway through the year.

The difference matters. Random gaps just need filling or filtering. Systematic gaps mean your data has structural problems that affect any analysis you do with it. Here's how to figure out which you're dealing with.

Step 1: See which columns have the most NULLs

Open your file in ExploreMyData and expand the Column Explorer panel. Each column shows a quick summary including the count of NULL values and the percentage of the column that's empty.

Scan through the columns. A dataset with 10,000 rows where "email" has 47 NULLs is probably fine - some people didn't provide an email. But if "phone" has 3,200 NULLs, that's a third of your data. Worth investigating.

Column	Type	Nulls	% Null
customer_id	BIGINT	0	0%
name	VARCHAR	0	0%
email	VARCHAR	47	0.5%
phone	VARCHAR	3,200	32%
signup_date	DATE	0	0%
source	VARCHAR	12	0.1%

At a glance, phone stands out with 3,200 nulls (32%). All other columns are nearly complete.

Make a mental list of the columns with the highest NULL rates. Those are the ones you'll investigate next.

Step 2: Filter to just the rows with missing data

Pick the column with the most NULLs. Open the Filter operation and set the condition to phone IS NULL. Now you're looking at only the rows where that field is empty.

Scroll through the results. Do you notice anything? Are the dates clustered in a particular range? Do the rows share a common value in another column - the same region, source system, or account type?

This is the eyeball test. It's not rigorous, but it's often enough to spot obvious patterns. If 3,200 rows with missing phone numbers all have source = "web_import", you've found your answer - the web import doesn't collect phone numbers.

Step 3: Create a "has missing data" flag

For a more systematic view, use Add Column to create a flag that marks whether each row has missing data in the columns you care about.

Set the expression to something like:

CASE WHEN phone IS NULL OR email IS NULL THEN 'missing' ELSE 'complete' END

Name the new column data_completeness. Now every row is tagged, and you can use this flag in further analysis.

customer_id	name	email	phone	data_completeness
1001	Alice Chen	alice@example.com	555-0142	complete
1002	Bob Lund	bob@example.com	NULL	missing
1003	Priya Shah	NULL	NULL	missing
1004	Tom Reed	tom@example.com	555-0299	complete

New data_completeness column added with the CASE WHEN expression. Each row is now tagged, ready for grouping.

Step 4: Pivot to see the pattern

This is where it gets interesting. Use the Pivot operation to count missing vs. complete rows broken down by another column.

For example, pivot with:

Rows: source (or region, or signup_month)
Columns: data_completeness
Values: COUNT

The result is a table showing how many complete vs. missing rows exist for each source. If one source has 3,100 of your 3,200 missing rows, that's your pattern.

source	complete	missing
crm_sync	4,821	63
sales_team	1,903	37
web_import	100	3,100

The pattern is clear: 3,100 of the 3,200 missing phone rows come from web_import. The web import flow simply doesn't collect phone numbers.

Try different breakdowns. Pivot by month to see if the gaps are time-based. Pivot by category to see if certain product lines have worse data quality. Each pivot takes a few seconds and gives you a different angle.

What to do once you find the pattern

Once you know why data is missing, you can decide what to do about it:

Source-based gaps: The data was never collected. You can filter out those rows, fill with defaults, or fix the upstream system.
Time-based gaps: A field was added later. Everything before the cutoff date is legitimately empty. Acknowledge this in your analysis.
Category-based gaps: Certain categories don't have certain attributes. "Shipping weight" being NULL for digital products isn't a data quality issue - it's correct.
Truly random: No pattern. Some values are just missing. Use Fill Missing with a sensible default, or exclude those rows from calculations.

A real example

Say you have a customer support dataset with columns for ticket_id, category, priority, assigned_to, resolution_time, and satisfaction_score. The satisfaction_score column is 40% NULL.

You filter to satisfaction_score IS NULL and notice most of these tickets have category = "billing". You pivot by category and confirm: billing tickets have a 72% NULL rate for satisfaction, while technical support tickets are only 15% NULL.

The reason? Billing tickets get resolved through an automated system that doesn't send satisfaction surveys. This isn't a data quality problem - it's a process gap. And now you know that any "average satisfaction score" metric you calculate is biased toward non-billing tickets.

That's the kind of insight you only get when you look at the shape of what's missing, not just what's present.

Explore your missing data patterns →

Finding Patterns in Missing Data

Step 1: See which columns have the most NULLs

Step 2: Filter to just the rows with missing data

Step 3: Create a "has missing data" flag

Step 4: Pivot to see the pattern

What to do once you find the pattern

A real example

Try it yourself

Related tools

More from the blog

A Complete Guide to Handling NULL and Empty Values

5 Ways to Clean Messy Data Without Writing Code

How to Create Pivot Tables in Your Browser