March 8, 2026 by Arif Aslam 5 min read

Cleaning an Email Marketing List Before a Campaign

Marketing just merged three different email lists into one spreadsheet. There are 5,000 rows. Some came from a webinar sign-up form, some from a trade show scanner, and some from a purchased list that looked fine in the preview. Before anyone hits "send," you need to clean it. Duplicates will inflate your costs and annoy recipients. Bad formatting means bounces. Test addresses and no-reply aliases will tank your deliverability score. And ideally, you want to segment by company versus personal email so the sales team can prioritize.

Here's a six-step pipeline in ExploreMyData that takes a messy email list and outputs a clean, segmented file ready for the campaign.

Step 1: Normalize email formatting

Select Text Transform from the Transform group. Choose the email column and apply "trim" to strip leading and trailing whitespace. Then apply the operation again with "lowercase."

This matters more than you'd think. The trade show scanner captured " John.Smith@Acme.com " (with spaces and mixed case). The webinar form captured "john.smith@acme.com". Without normalization, the deduplication step won't catch these as the same address. Email addresses are case-insensitive by spec, but your CSV doesn't know that.

Two pipeline steps for one column, but they're cheap and they prevent false duplicates downstream. Under the hood, it runs LOWER(TRIM(email)).

first_name	email (raw)	email (normalized)	source
John	John.Smith@Acme.com	john.smith@acme.com	trade show
John	john.smith@acme.com	john.smith@acme.com	webinar
Fatima	FATIMA.ALI@SYNCO.COM	fatima.ali@synco.com	purchased list
	noreply@test.com	noreply@test.com	purchased list

After trim and lowercase, rows 1 and 2 are now identical - deduplication in step 4 will remove one. The noreply address will be caught by the filter in step 3.

Step 2: Remove empty and NULL emails

Select Filter and set the condition to email IS NOT NULL AND email != ''.

Merged lists almost always have blank rows. Maybe someone filled in a name at the trade show but the scanner didn't capture the email. Maybe the webinar form allowed empty submissions. These rows are useless for an email campaign. Get rid of them early so they don't skew your count.

Step 3: Filter out test and system addresses

Add another Filter step. This time, exclude the addresses that are obviously not real recipients:

email NOT LIKE 'test@%' AND email NOT LIKE 'demo@%' AND email NOT LIKE 'noreply@%' AND email NOT LIKE 'no-reply@%' AND email NOT LIKE '%@example.com' AND email NOT LIKE '%@test.com'

This catches the common patterns. Test accounts that someone used during form setup. System addresses that got pulled into the list by mistake. Example.com addresses from the RFC 2606 reserved domain. You'd be surprised how many of these end up in a merged list.

If your company uses specific test patterns (like internal QA aliases), add those to the filter too. Better to be aggressive here. Sending to a noreply@ address doesn't just waste money - it's a hard bounce that hurts your sender reputation.

email	matched pattern	action
noreply@acme.com	noreply@%	removed
test@example.com	%@example.com	removed
demo@formtool.io	demo@%	removed
jane.liu@parsec.com	(none)	kept
mark.v@gmail.com	(none)	kept

System and test addresses filtered out before deduplication. Hard bounces from these addresses would hurt sender reputation.

Step 4: Deduplicate

Select Remove Duplicates from the Filter & Sort group. Choose email as the column to check.

This runs SELECT DISTINCT ON (email) *, keeping the first occurrence and dropping the rest. Because you already normalized to lowercase and trimmed whitespace in step 1, "John.Smith@Acme.com" and "john.smith@acme.com " are now the same value and get properly deduplicated.

Check the row count in the pipeline sidebar. If you started with 5,000 and you're down to 3,800, that means 1,200 were empty, test addresses, or duplicates. That's 24% of the list that would have been wasted sends.

Step 5: Extract the domain

Select Extract Text from the Transform group. Choose the email column and extract the text after the "@" symbol. Name the new column domain.

If Extract Text doesn't have a built-in "after delimiter" option, use Add Column instead with the expression:

SPLIT_PART(email, '@', 2)

Now you have a domain column. This is useful on its own - you can sort or group by domain to see how many contacts you have at each company. But the real payoff is the next step.

Step 6: Segment by company vs. personal

Select Add Column. Name it email_type and enter:

CASE WHEN domain IN ('gmail.com', 'yahoo.com', 'hotmail.com', 'outlook.com', 'aol.com', 'icloud.com', 'protonmail.com', 'mail.com', 'zoho.com', 'yandex.com') THEN 'personal' ELSE 'company' END

This is a simple heuristic. If the domain is a major free email provider, it's a personal address. Everything else gets tagged as "company." It's not perfect - someone might use a custom domain for personal email - but it's good enough for segmentation. The sales team can filter to "company" emails and prioritize those for outreach. The marketing team can use "personal" emails for a different messaging track.

You can extend the list with regional providers (gmx.de, mail.ru, qq.com) if your list is international.

first_name	email	domain	email_type
John	john.smith@acme.com	acme.com	company
Fatima	fatima.ali@synco.com	synco.com	company
Mark	mark.v@gmail.com	gmail.com	personal
Sandra	sandrab@hotmail.com	hotmail.com	personal
Jane	jane.liu@parsec.com	parsec.com	company

From 5,000 rows to 3,800 clean, deduplicated, segmented contacts. Sales team filters to "company" for outreach; marketing uses "personal" for a separate track.

The full pipeline

Text Transform (trim + lowercase the email column)
Filter (remove NULL and empty emails)
Filter (remove test@, demo@, noreply@, and @example.com patterns)
Remove Duplicates (DISTINCT ON email)
Extract Text / Add Column (domain from email)
Add Column (email_type: company vs. personal)

From 5,000 messy rows to a clean, deduplicated, segmented list. Export the result as CSV and load it into your email platform. You've reduced bounce risk, avoided duplicate sends, and given the team a useful segmentation they didn't have before.

The pipeline is reusable. Next time someone hands you a new batch of leads from a conference or a form export, load the file and run the same steps. Every operation is visible in the sidebar, so you can adjust the test-address patterns or add new personal email domains as needed.

Clean your email list now →

Cleaning an Email Marketing List Before a Campaign

Step 1: Normalize email formatting

Step 2: Remove empty and NULL emails

Step 3: Filter out test and system addresses

Step 4: Deduplicate

Step 5: Extract the domain

Step 6: Segment by company vs. personal

The full pipeline

Try it yourself

Related tools

More from the blog

Extracting Email Domains

5 Ways to Clean Messy Data Without Writing Code

Removing Test Rows from Production Data