Merging Survey Responses with Respondent Data
You ran a customer satisfaction survey through Typeform, SurveyMonkey, or Google Forms. The export gives you responses - ratings, comments, timestamps - but the only identifying field is an email address. Your boss wants to know: how does satisfaction differ by department? By customer tier? By region?
The survey tool doesn't know any of that. Your customer database does. To answer those questions, you need to join the two files together. Here's how to do it in ExploreMyData, step by step.
The two files
File 1: Survey responses. Columns include response_id, email, submitted_at, overall_rating (1-5), nps_score (0-10), and comments. Maybe 500 rows - not everyone responds to surveys.
File 2: Customer database. Columns include customer_id, email, full_name, company, department, city, state, customer_tier, and account_manager. Several thousand rows.
The common column is email. But here's the catch: the survey tool might store emails as "John.Smith@Acme.com" while your database has "john.smith@acme.com". If the case doesn't match, the join fails silently - you get NULL values where you expected customer data.
File 1: survey_responses.csv (500 rows)
| response_id | overall_rating | nps_score | |
|---|---|---|---|
| R-0201 | Sarah.Kim@Meridian.com | 4 | 8 |
| R-0202 | tom.harris@gmail.com | 2 | 3 |
File 2: customers.csv (4,200 rows)
| customer_id | department | customer_tier | |
|---|---|---|---|
| C-8801 | sarah.kim@meridian.com | Engineering | Enterprise |
| C-8802 | m.patel@sunridge.co | Finance | Mid-Market |
Note the case mismatch: "Sarah.Kim@Meridian.com" vs "sarah.kim@meridian.com" - lowercasing both before joining is essential.
Step 1: Load both files
Drag both CSV files into ExploreMyData or use the file picker. Each file appears as a separate table. You can switch between them using the table tabs at the top. Both are loaded into DuckDB in your browser - nothing leaves your machine.
Take a quick look at each file. Check the email column in both: do they look consistent? Are there obvious formatting differences? This two-minute check saves debugging later.
Step 2: Lowercase emails in both files
Before joining, normalize the join key. Select the survey responses table. Open Text Transform, choose email, and apply "lowercase". First pipeline card appears.
Switch to the customer database table and do the same thing: Text Transform, email, lowercase. Now "John.Smith@Acme.com" becomes "john.smith@acme.com" in both files.
This step is easy to skip and painful to debug. If your join returns mostly NULLs and you're sure the emails exist in both files, case mismatch is almost always the reason.
Step 3: Join the files on email
Go back to the survey responses table - this is the file you want to enrich. Select Join from the Data group. Configure it:
- Join type: LEFT JOIN
- Right table: your customer database file
- Left key:
email - Right key:
email
Why LEFT JOIN? Because you want to keep every survey response, even if the respondent doesn't exist in your customer database. Maybe they used a personal email address, or maybe they're a new contact who hasn't been added to the CRM yet. A LEFT JOIN preserves those rows with NULL values for the customer fields rather than dropping them.
Click Apply. The table now has both the survey columns and the customer columns. Where an email matched, you'll see the company, department, tier, and everything else filled in. Where it didn't match, those columns show NULL.
| overall_rating | nps_score | department | customer_tier | |
|---|---|---|---|---|
| sarah.kim@meridian.com | 4 | 8 | Engineering | Enterprise |
| tom.harris@gmail.com | 2 | 3 | NULL | NULL |
| priya.mehta@clearpath.io | 5 | 9 | Sales | Mid-Market |
| unknown42@hotmail.com | 3 | 5 | NULL | NULL |
LEFT JOIN keeps all 500 survey responses. Rows with NULL customer columns are respondents not found in the database.
Step 4: Handle the non-matching respondents
Those NULL rows deserve attention. They're respondents whose emails didn't match anything in your customer database. There could be a few reasons:
- They used a personal email (gmail, yahoo) instead of their work email
- They're a new customer not yet in the database
- The email has a typo in one of the files
Select Fill Missing. For company, department, and customer_tier, use "literal" mode and fill with "Unknown". This makes the data clean for grouping - you can filter these out later, or analyze them as their own segment.
Another approach: add a calculated column called matched using Add Column with a CASE WHEN expression: if customer_id is NULL, "No", else "Yes". This gives you a quick filter to see your match rate. If only 60% of responses matched, you might want to investigate the unmatched emails before drawing conclusions.
Step 5: Trim to useful columns
After the join, you have columns from both files - including duplicates like two email columns and fields from the customer database you don't need for survey analysis.
Use Select Columns to keep only what matters for the analysis. A good final set might be:
email,full_namecompany,department,customer_tieroverall_rating,nps_score,commentssubmitted_at
Now you have a single clean table: every survey response enriched with the respondent's company context.
What you can do next
With the merged data, the analysis opens up. A few things you can now build by adding more pipeline steps:
- Satisfaction by department: use Pivot with department as rows and AVG of overall_rating as the value.
- NPS by customer tier: filter to each tier and compute the NPS (% Promoters minus % Detractors).
- Response rate by company: count responses per company, then compare against total customers per company from the database.
Each of these is just a few more pipeline cards. The join step did the hard work - everything after it is standard aggregation and filtering.
The finished pipeline
Five cards in the pipeline sidebar:
- Text Transform (lowercase emails in survey file)
- Text Transform (lowercase emails in customer file)
- LEFT JOIN on email
- Fill Missing (label unmatched respondents)
- Select Columns (trim to essentials)
Next time you run a survey, export the new responses, load them alongside the same customer database, and the pipeline merges and cleans everything again. The join key normalization, the NULL handling, the column selection - all saved.
| full_name | company | department | customer_tier | overall_rating | nps_score |
|---|---|---|---|---|---|
| Sarah Kim | Meridian | Engineering | Enterprise | 4 | 8 |
| Priya Mehta | Clearpath | Sales | Mid-Market | 5 | 9 |
| Tom Harris | Unknown | Unknown | Unknown | 2 | 3 |
| David Park | Oakridge Systems | Support | SMB | 3 | 6 |
Ready for analysis: pivot by department to compare satisfaction scores, or filter by customer_tier to find NPS gaps between tiers.