Extracting Email Domains to Identify Company Accounts
You have a list of 10,000 users. Each row has a name and an email address. Your boss asks: "Which companies have the most accounts?" There's no company column. There never is.
But the information is right there, hiding after the @ symbol.
sarah@techwave.io works at TechWave.
mike@dataflow.com works at Dataflow.
jenny@gmail.com is using a personal account.
You just need to extract it. Here's how to do that in ExploreMyData without writing any code, and what's happening under the hood.
The starting data
Imagine a CSV exported from your signup database. It looks something like this:
| name | signup_date | |
|---|---|---|
| Sarah Chen | sarah.chen@techwave.io | 2026-01-15 |
| Mike Rivera | mrivera@Dataflow.com | 2026-01-18 |
| Jenny Park | jenny.park@gmail.com | 2026-02-03 |
| Alex Novak | ANovak@TechWave.IO | 2026-02-11 |
| Priya Sharma | priya@dataflow.com | 2026-02-14 |
Notice the problem right away: Alex's email is ANovak@TechWave.IO while
Sarah's is sarah.chen@techwave.io. Same company, different casing. We'll
need to handle that.
Step 1: Extract the domain
Open the toolbar and select Extract Text from the Transform group. Choose the
email column, set the method to "after delimiter", and type
@ as the delimiter. Name the new column
domain.
Under the hood, ExploreMyData generates:
SPLIT_PART(email, '@', 2) AS domain
This grabs everything after the @. Your table now has a new
domain column with values like
techwave.io,
Dataflow.com, and
TechWave.IO.
After Extract Text (after delimiter "@") - new domain column added with everything after the @ symbol:
| name | domain (new) | |
|---|---|---|
| Sarah Chen | sarah.chen@techwave.io | techwave.io |
| Mike Rivera | mrivera@Dataflow.com | Dataflow.com |
| Jenny Park | jenny.park@gmail.com | gmail.com |
| Alex Novak | ANovak@TechWave.IO | TechWave.IO |
"techwave.io" and "TechWave.IO" are still different values - lowercase transform is needed next.
Step 2: Normalize the casing
The extracted domains have inconsistent casing. TechWave.IO and
techwave.io need to be the same value, otherwise they'll
show up as separate groups.
Select Text Transform from the Transform group. Choose the
domain column and pick "lowercase".
The SQL:
LOWER(domain)
Now every domain is lowercase. techwave.io,
dataflow.com,
gmail.com. Clean and consistent.
Step 3: See the distribution
Now for the payoff. Click on the domain column header to open
Column Explorer. Since domain is a text column, ExploreMyData shows a
categorical breakdown: the most frequent values, their counts, and their percentage of the total.
You can immediately see that techwave.io has 847 users,
dataflow.com has 312,
and gmail.com has 2,341. That last number tells you
something important: most of your users are on personal accounts, not company ones.
Column Explorer breakdown of the lowercased domain column - top domains by user count:
| domain | count | % of total | type |
|---|---|---|---|
| gmail.com | 2,341 | 23.4% | personal |
| outlook.com | 1,102 | 11.0% | personal |
| techwave.io | 847 | 8.5% | company |
| dataflow.com | 312 | 3.1% | company |
| yahoo.com | 289 | 2.9% | personal |
Over a third of users are on personal email domains - valuable signal for segmenting your user base.
What you can do with this
Once you have a clean domain column, the analysis possibilities open up:
- Filter out personal email domains (gmail.com, yahoo.com, hotmail.com, outlook.com) to focus only on business accounts.
- Group by domain to find your biggest company customers by user count.
- Combine with signup date to see when companies started adopting your product - did TechWave sign up all at once, or trickle in over months?
- Export the enriched data with the new domain column included for use in your CRM or reporting tool.
Two steps, no regex
The whole operation is two pipeline steps: extract and lowercase. You didn't write a regex. You didn't open a terminal. You didn't paste anything into a spreadsheet formula bar. The SQL is there in the pipeline if you want to see it, copy it, or modify it - but you didn't have to write it.
Two-step pipeline - each step produces visible SQL and a live result:
| Step | Operation | Generated SQL |
|---|---|---|
| 1 | Extract Text (after @) | SPLIT_PART(email, '@', 2) AS domain |
| 2 | Text Transform: lowercase | LOWER(domain) |
No regex written. No terminal opened. The SQL is there to inspect or copy for an ETL pipeline.