← All posts
5 min read

Cleaning an Email Marketing List Before a Campaign

Marketing just merged three different email lists into one spreadsheet. There are 5,000 rows. Some came from a webinar sign-up form, some from a trade show scanner, and some from a purchased list that looked fine in the preview. Before anyone hits "send," you need to clean it. Duplicates will inflate your costs and annoy recipients. Bad formatting means bounces. Test addresses and no-reply aliases will tank your deliverability score. And ideally, you want to segment by company versus personal email so the sales team can prioritize.

Here's a six-step pipeline in ExploreMyData that takes a messy email list and outputs a clean, segmented file ready for the campaign.

Step 1: Normalize email formatting

Select Text Transform from the Transform group. Choose the email column and apply "trim" to strip leading and trailing whitespace. Then apply the operation again with "lowercase."

This matters more than you'd think. The trade show scanner captured " John.Smith@Acme.com " (with spaces and mixed case). The webinar form captured "john.smith@acme.com". Without normalization, the deduplication step won't catch these as the same address. Email addresses are case-insensitive by spec, but your CSV doesn't know that.

Two pipeline steps for one column, but they're cheap and they prevent false duplicates downstream. Under the hood, it runs LOWER(TRIM(email)).

first_nameemail (raw)email (normalized)source
John John.Smith@Acme.com john.smith@acme.comtrade show
Johnjohn.smith@acme.comjohn.smith@acme.comwebinar
FatimaFATIMA.ALI@SYNCO.COMfatima.ali@synco.compurchased list
noreply@test.comnoreply@test.compurchased list

After trim and lowercase, rows 1 and 2 are now identical - deduplication in step 4 will remove one. The noreply address will be caught by the filter in step 3.

Step 2: Remove empty and NULL emails

Select Filter and set the condition to email IS NOT NULL AND email != ''.

Merged lists almost always have blank rows. Maybe someone filled in a name at the trade show but the scanner didn't capture the email. Maybe the webinar form allowed empty submissions. These rows are useless for an email campaign. Get rid of them early so they don't skew your count.

Step 3: Filter out test and system addresses

Add another Filter step. This time, exclude the addresses that are obviously not real recipients:

email NOT LIKE 'test@%' AND email NOT LIKE 'demo@%' AND email NOT LIKE 'noreply@%' AND email NOT LIKE 'no-reply@%' AND email NOT LIKE '%@example.com' AND email NOT LIKE '%@test.com'

This catches the common patterns. Test accounts that someone used during form setup. System addresses that got pulled into the list by mistake. Example.com addresses from the RFC 2606 reserved domain. You'd be surprised how many of these end up in a merged list.

If your company uses specific test patterns (like internal QA aliases), add those to the filter too. Better to be aggressive here. Sending to a noreply@ address doesn't just waste money - it's a hard bounce that hurts your sender reputation.

emailmatched patternaction
noreply@acme.comnoreply@%removed
test@example.com%@example.comremoved
demo@formtool.iodemo@%removed
jane.liu@parsec.com(none)kept
mark.v@gmail.com(none)kept

System and test addresses filtered out before deduplication. Hard bounces from these addresses would hurt sender reputation.

Step 4: Deduplicate

Select Remove Duplicates from the Filter & Sort group. Choose email as the column to check.

This runs SELECT DISTINCT ON (email) *, keeping the first occurrence and dropping the rest. Because you already normalized to lowercase and trimmed whitespace in step 1, "John.Smith@Acme.com" and "john.smith@acme.com " are now the same value and get properly deduplicated.

Check the row count in the pipeline sidebar. If you started with 5,000 and you're down to 3,800, that means 1,200 were empty, test addresses, or duplicates. That's 24% of the list that would have been wasted sends.

Step 5: Extract the domain

Select Extract Text from the Transform group. Choose the email column and extract the text after the "@" symbol. Name the new column domain.

If Extract Text doesn't have a built-in "after delimiter" option, use Add Column instead with the expression:

SPLIT_PART(email, '@', 2)

Now you have a domain column. This is useful on its own - you can sort or group by domain to see how many contacts you have at each company. But the real payoff is the next step.

Step 6: Segment by company vs. personal

Select Add Column. Name it email_type and enter:

CASE WHEN domain IN ('gmail.com', 'yahoo.com', 'hotmail.com', 'outlook.com', 'aol.com', 'icloud.com', 'protonmail.com', 'mail.com', 'zoho.com', 'yandex.com') THEN 'personal' ELSE 'company' END

This is a simple heuristic. If the domain is a major free email provider, it's a personal address. Everything else gets tagged as "company." It's not perfect - someone might use a custom domain for personal email - but it's good enough for segmentation. The sales team can filter to "company" emails and prioritize those for outreach. The marketing team can use "personal" emails for a different messaging track.

You can extend the list with regional providers (gmx.de, mail.ru, qq.com) if your list is international.

first_nameemaildomainemail_type
Johnjohn.smith@acme.comacme.comcompany
Fatimafatima.ali@synco.comsynco.comcompany
Markmark.v@gmail.comgmail.compersonal
Sandrasandrab@hotmail.comhotmail.compersonal
Janejane.liu@parsec.comparsec.comcompany

From 5,000 rows to 3,800 clean, deduplicated, segmented contacts. Sales team filters to "company" for outreach; marketing uses "personal" for a separate track.

The full pipeline

  1. Text Transform (trim + lowercase the email column)
  2. Filter (remove NULL and empty emails)
  3. Filter (remove test@, demo@, noreply@, and @example.com patterns)
  4. Remove Duplicates (DISTINCT ON email)
  5. Extract Text / Add Column (domain from email)
  6. Add Column (email_type: company vs. personal)

From 5,000 messy rows to a clean, deduplicated, segmented list. Export the result as CSV and load it into your email platform. You've reduced bounce risk, avoided duplicate sends, and given the team a useful segmentation they didn't have before.

The pipeline is reusable. Next time someone hands you a new batch of leads from a conference or a form export, load the file and run the same steps. Every operation is visible in the sidebar, so you can adjust the test-address patterns or add new personal email domains as needed.

Clean your email list now →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData