← All posts
5 min read

Cleaning and Normalizing a CRM Export

You export your contacts from Salesforce, HubSpot, or whatever CRM your company uses. The CSV lands in your Downloads folder. You open it and immediately see the problems.

Sixty columns, most of which you don't need. "TechWave Solutions" appears as "TechWave Solutions", "TECHWAVE SOLUTIONS", "techwave solutions", and "TechWave Solutions " (with a trailing space). Contact names are in mixed case. Half the phone number fields are blank. And there are duplicate contacts - the same person entered twice because someone created a new record instead of updating the existing one.

This is the kind of mess that makes people open a spreadsheet and start manually fixing cells. Don't do that. Build a pipeline in ExploreMyData that fixes everything in the right order and can be reapplied next time you export.

first_namelast_nameemailcompany_namephone
johnsmithJohn.Smith@TechWave.comTECHWAVE SOLUTIONS415-555-0182
LISANGUYENLNGUYEN@TECHWAVE.COM TechWave Solutions
MarcusWebbmarcus.webb@acmecorp.comAcme Corp Inc.312-555-0094
johnsmithjohn.smith@techwave.comtechwave solutions415-555-0182

Inconsistent casing, trailing spaces, duplicate contact (rows 1 and 4), and a missing phone on row 2 - typical CRM export problems.

Step 1: Strip down to the columns you need

Start with the biggest win: getting rid of noise. Select Select Columns from the Columns group. In a typical CRM export, you might keep 10-15 columns out of 60:

  • contact_id
  • first_name, last_name
  • email
  • phone
  • company_name
  • job_title
  • city, state
  • created_date, last_activity_date

Uncheck everything else. The first pipeline card appears, and your table goes from impossibly wide to something you can actually read.

Do this first. Every subsequent operation runs faster on 10 columns than on 60, and you won't accidentally clean data in a column you're about to throw away.

Step 2: Trim whitespace from all text columns

Trailing spaces are invisible enemies. "TechWave Solutions" and "TechWave Solutions " look identical on screen but are treated as different values by every grouping and matching operation.

Select Text Transform from the Transform group. Choose each text column and apply "trim". This strips leading and trailing whitespace. You'll want to do this for first_name, last_name, email, company_name, city, and job_title.

Each column gets its own pipeline card. Under the hood, ExploreMyData uses TRIM(column_name). Six columns, six cards - but each one runs instantly.

Step 3: Capitalize names consistently

Your contacts have "john smith", "JOHN SMITH", and "John Smith". They're all the same person, but your CRM doesn't enforce formatting.

Use Text Transform again. Select first_name and apply "capitalize" (title case). Do the same for last_name. This converts everything to "John Smith" format.

For email, apply "lowercase" instead - emails should always be lowercase for matching purposes.

Pipeline sidebar - steps so far

  1. Select Columns - kept 10 of 62 columns
  2. Text Transform - trim first_name
  3. Text Transform - trim last_name
  4. Text Transform - trim email
  5. Text Transform - trim company_name
  6. Text Transform - capitalize first_name
  7. Text Transform - capitalize last_name
  8. Text Transform - lowercase email

Each operation is a separate card - visible, editable, and applied in order.

Step 4: Normalize company name variations

This is the hardest problem in CRM data. "TechWave Solutions", "TECHWAVE SOLUTIONS", "Techwave", "TechWave Solutions Inc.", "TechWave Solutions, Inc" - all the same company.

Start by applying Text Transform with "capitalize" on company_name. That collapses the case variations into one form.

For the remaining variants (with and without "Inc.", abbreviations, etc.), use Find & Replace. Select company_name, find "Techwave Solutions Inc." and replace with "Techwave Solutions". Add another Find & Replace for "Techwave Solutions, Inc" to "Techwave Solutions".

Yes, this is manual. But it's documented. Each replacement is a pipeline card with the exact find/replace pair visible. When you export again in three months and there's a new variation, you add one more card. The old replacements still work.

For the most common issue - trailing suffixes like "Inc.", "LLC", "Ltd" - a single Find & Replace per suffix handles hundreds of companies at once.

Step 5: Fill missing phone numbers

Some contacts have phone numbers; some don't. If multiple contacts exist at the same company, there might be a shared office number on some records but not others.

Select Fill Missing from the Data group. Choose phone as the column. For many cases, "forward fill" works well - it uses the previous row's value when the current row is NULL.

For this to work well, sort by company_name first so contacts at the same company are grouped together. Then forward fill propagates a known phone number to the blank rows below it within the same company block.

If you'd rather mark unknowns explicitly, use "literal" mode and fill with "No phone on file". Either way, you've eliminated the NULLs.

Step 6: Remove duplicate contacts

Finally, deduplicate. Select Remove Duplicates and choose email as the column to check. Since email addresses are unique per person (and you already lowercased them), this catches duplicates even when names are slightly different.

This generates SELECT DISTINCT ON (email) *. The first row for each email address is kept; the rest are dropped. If you had 8,000 contacts and 500 were duplicates, you're down to 7,500 clean rows.

The pipeline you built

Scroll through the pipeline sidebar. You'll see something like a dozen cards:

  1. Select Columns (trim to essentials)
  2. Trim whitespace (multiple columns)
  3. Capitalize names
  4. Lowercase emails
  5. Capitalize company names
  6. Find & Replace company variations
  7. Sort by company
  8. Forward fill phone numbers
  9. Remove duplicates by email

Every step is visible, editable, and deletable. If you realize you need to also normalize job titles ("VP of Sales" vs "Vice President of Sales"), you insert a new card wherever it makes sense. The rest of the pipeline adjusts.

The real value is next quarter. Export from the CRM again, load the file, and the same pipeline cleans it the same way. Five minutes instead of an afternoon.

first_namelast_nameemailcompany_namephone
JohnSmithjohn.smith@techwave.comTechwave Solutions415-555-0182
LisaNguyenlnguyen@techwave.comTechwave Solutions415-555-0182
MarcusWebbmarcus.webb@acmecorp.comAcme Corp312-555-0094

Consistent casing, trimmed whitespace, normalized company names, phone filled via forward fill, and duplicate removed - 7,500 clean rows from 8,000.

Clean your CRM export →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData