← All posts
4 min read

Extracting Email Domains to Identify Company Accounts

You have a list of 10,000 users. Each row has a name and an email address. Your boss asks: "Which companies have the most accounts?" There's no company column. There never is.

But the information is right there, hiding after the @ symbol. sarah@techwave.io works at TechWave. mike@dataflow.com works at Dataflow. jenny@gmail.com is using a personal account.

You just need to extract it. Here's how to do that in ExploreMyData without writing any code, and what's happening under the hood.

The starting data

Imagine a CSV exported from your signup database. It looks something like this:

name email signup_date
Sarah Chensarah.chen@techwave.io2026-01-15
Mike Riveramrivera@Dataflow.com2026-01-18
Jenny Parkjenny.park@gmail.com2026-02-03
Alex NovakANovak@TechWave.IO2026-02-11
Priya Sharmapriya@dataflow.com2026-02-14

Notice the problem right away: Alex's email is ANovak@TechWave.IO while Sarah's is sarah.chen@techwave.io. Same company, different casing. We'll need to handle that.

Step 1: Extract the domain

Open the toolbar and select Extract Text from the Transform group. Choose the email column, set the method to "after delimiter", and type @ as the delimiter. Name the new column domain.

Under the hood, ExploreMyData generates:

SPLIT_PART(email, '@', 2) AS domain

This grabs everything after the @. Your table now has a new domain column with values like techwave.io, Dataflow.com, and TechWave.IO.

After Extract Text (after delimiter "@") - new domain column added with everything after the @ symbol:

name email domain (new)
Sarah Chensarah.chen@techwave.iotechwave.io
Mike Riveramrivera@Dataflow.comDataflow.com
Jenny Parkjenny.park@gmail.comgmail.com
Alex NovakANovak@TechWave.IOTechWave.IO

"techwave.io" and "TechWave.IO" are still different values - lowercase transform is needed next.

Step 2: Normalize the casing

The extracted domains have inconsistent casing. TechWave.IO and techwave.io need to be the same value, otherwise they'll show up as separate groups.

Select Text Transform from the Transform group. Choose the domain column and pick "lowercase".

The SQL:

LOWER(domain)

Now every domain is lowercase. techwave.io, dataflow.com, gmail.com. Clean and consistent.

Step 3: See the distribution

Now for the payoff. Click on the domain column header to open Column Explorer. Since domain is a text column, ExploreMyData shows a categorical breakdown: the most frequent values, their counts, and their percentage of the total.

You can immediately see that techwave.io has 847 users, dataflow.com has 312, and gmail.com has 2,341. That last number tells you something important: most of your users are on personal accounts, not company ones.

Column Explorer breakdown of the lowercased domain column - top domains by user count:

domain count % of total type
gmail.com2,34123.4%personal
outlook.com1,10211.0%personal
techwave.io8478.5%company
dataflow.com3123.1%company
yahoo.com2892.9%personal

Over a third of users are on personal email domains - valuable signal for segmenting your user base.

What you can do with this

Once you have a clean domain column, the analysis possibilities open up:

  • Filter out personal email domains (gmail.com, yahoo.com, hotmail.com, outlook.com) to focus only on business accounts.
  • Group by domain to find your biggest company customers by user count.
  • Combine with signup date to see when companies started adopting your product - did TechWave sign up all at once, or trickle in over months?
  • Export the enriched data with the new domain column included for use in your CRM or reporting tool.

Two steps, no regex

The whole operation is two pipeline steps: extract and lowercase. You didn't write a regex. You didn't open a terminal. You didn't paste anything into a spreadsheet formula bar. The SQL is there in the pipeline if you want to see it, copy it, or modify it - but you didn't have to write it.

Two-step pipeline - each step produces visible SQL and a live result:

Step Operation Generated SQL
1Extract Text (after @)SPLIT_PART(email, '@', 2) AS domain
2Text Transform: lowercaseLOWER(domain)

No regex written. No terminal opened. The SQL is there to inspect or copy for an ETL pipeline.

Try it with your own data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData