December 9, 2025 by Arif Aslam 4 min read

Extracting Email Domains to Identify Company Accounts

You have a list of 10,000 users. Each row has a name and an email address. Your boss asks: "Which companies have the most accounts?" There's no company column. There never is.

But the information is right there, hiding after the @ symbol. sarah@techwave.io works at TechWave. mike@dataflow.com works at Dataflow. jenny@gmail.com is using a personal account.

You just need to extract it. Here's how to do that in ExploreMyData without writing any code, and what's happening under the hood.

The starting data

Imagine a CSV exported from your signup database. It looks something like this:

name	email	signup_date
Sarah Chen	sarah.chen@techwave.io	2026-01-15
Mike Rivera	mrivera@Dataflow.com	2026-01-18
Jenny Park	jenny.park@gmail.com	2026-02-03
Alex Novak	ANovak@TechWave.IO	2026-02-11
Priya Sharma	priya@dataflow.com	2026-02-14

Notice the problem right away: Alex's email is ANovak@TechWave.IO while Sarah's is sarah.chen@techwave.io. Same company, different casing. We'll need to handle that.

Step 1: Extract the domain

Open the toolbar and select Extract Text from the Transform group. Choose the email column, set the method to "after delimiter", and type @ as the delimiter. Name the new column domain.

Under the hood, ExploreMyData generates:

SPLIT_PART(email, '@', 2) AS domain

This grabs everything after the @. Your table now has a new domain column with values like techwave.io, Dataflow.com, and TechWave.IO.

After Extract Text (after delimiter "@") - new domain column added with everything after the @ symbol:

name	email	domain (new)
Sarah Chen	sarah.chen@techwave.io	techwave.io
Mike Rivera	mrivera@Dataflow.com	Dataflow.com
Jenny Park	jenny.park@gmail.com	gmail.com
Alex Novak	ANovak@TechWave.IO	TechWave.IO

"techwave.io" and "TechWave.IO" are still different values - lowercase transform is needed next.

Step 2: Normalize the casing

The extracted domains have inconsistent casing. TechWave.IO and techwave.io need to be the same value, otherwise they'll show up as separate groups.

Select Text Transform from the Transform group. Choose the domain column and pick "lowercase".

The SQL:

LOWER(domain)

Now every domain is lowercase. techwave.io, dataflow.com, gmail.com. Clean and consistent.

Step 3: See the distribution

Now for the payoff. Click on the domain column header to open Column Explorer. Since domain is a text column, ExploreMyData shows a categorical breakdown: the most frequent values, their counts, and their percentage of the total.

You can immediately see that techwave.io has 847 users, dataflow.com has 312, and gmail.com has 2,341. That last number tells you something important: most of your users are on personal accounts, not company ones.

Column Explorer breakdown of the lowercased domain column - top domains by user count:

domain	count	% of total	type
gmail.com	2,341	23.4%	personal
outlook.com	1,102	11.0%	personal
techwave.io	847	8.5%	company
dataflow.com	312	3.1%	company
yahoo.com	289	2.9%	personal

Over a third of users are on personal email domains - valuable signal for segmenting your user base.

What you can do with this

Once you have a clean domain column, the analysis possibilities open up:

Filter out personal email domains (gmail.com, yahoo.com, hotmail.com, outlook.com) to focus only on business accounts.
Group by domain to find your biggest company customers by user count.
Combine with signup date to see when companies started adopting your product - did TechWave sign up all at once, or trickle in over months?
Export the enriched data with the new domain column included for use in your CRM or reporting tool.

Two steps, no regex

The whole operation is two pipeline steps: extract and lowercase. You didn't write a regex. You didn't open a terminal. You didn't paste anything into a spreadsheet formula bar. The SQL is there in the pipeline if you want to see it, copy it, or modify it - but you didn't have to write it.

Two-step pipeline - each step produces visible SQL and a live result:

Step	Operation	Generated SQL
1	Extract Text (after @)	SPLIT_PART(email, '@', 2) AS domain
2	Text Transform: lowercase	LOWER(domain)

No regex written. No terminal opened. The SQL is there to inspect or copy for an ETL pipeline.

Try it with your own data →

Extracting Email Domains to Identify Company Accounts

The starting data

Step 1: Extract the domain

Step 2: Normalize the casing

Step 3: See the distribution

What you can do with this

Two steps, no regex

Try it yourself

Related tools

More from the blog

5 Ways to Clean Messy Data Without Writing Code

A Complete Guide to Handling NULL and Empty Values

Converting Yes/No/True/False/1/0 to a Consistent Boolean