Splitting a Full Address into Street, City, State, Zip
You export a customer list and every address is crammed into a single column: "123 Main St, Springfield, IL 62701". One big string. You need to filter by state, or group by city, or send the zip codes to a mailing service. None of that works when everything is glued together.
This is one of the most common data cleaning tasks. CRM exports, e-commerce platforms, and form submissions all love stuffing addresses into one field. The fix takes about two minutes in ExploreMyData.
What we're starting with
A typical address column looks something like this:
123 Main St, Springfield, IL 62701456 Oak Ave, Chicago, IL 60601789 Pine Rd, Austin, TX 7870142 Elm St, Portland, OR 97201
The pattern is consistent: street, comma, city, comma, state and zip. That comma delimiter is your best friend here.
Starting data - full addresses in a single column, nothing is separately filterable:
| customer_id | name | address |
|---|---|---|
| C001 | Jordan Lee | 123 Main St, Springfield, IL 62701 |
| C002 | Maria Santos | 456 Oak Ave, Chicago, IL 60601 |
| C003 | David Osei | 789 Pine Rd, Austin, TX 78701 |
| C004 | Priya Sharma | 42 Elm St, Portland, OR 97201 |
Step 1: Split the address by comma
Open the Split Column operation from the Columns group. Select your
address column and set the delimiter to , (comma). Set the number
of parts to 3.
This creates three new columns:
address_1- the street ("123 Main St")address_2- the city (" Springfield")address_3- the state and zip (" IL 62701")
Under the hood, ExploreMyData generates:
SPLIT_PART(address, ',', 1) AS address_1
SPLIT_PART(address, ',', 2) AS address_2
SPLIT_PART(address, ',', 3) AS address_3
Two problems are immediately obvious. The city has a leading space. And the state and zip are still stuck together.
Step 2: Trim whitespace from city
Select Text Transform from the Transform group. Pick the
address_2 column and choose "trim".
That leading space disappears. " Springfield" becomes "Springfield".
The SQL is straightforward:
TRIM(address_2).
Now rename this column to "city" using Rename Column if you want cleaner headers.
After Split Column (by comma, 3 parts) and Text Transform (trim) on the city column:
| address_1 (street) | address_2 (city, trimmed) | address_3 (state + zip) |
|---|---|---|
| 123 Main St | Springfield | IL 62701 |
| 456 Oak Ave | Chicago | IL 60601 |
| 789 Pine Rd | Austin | TX 78701 |
| 42 Elm St | Portland | OR 97201 |
State and zip are still together in address_3 - step 3 will split those apart.
Step 3: Extract state and zip
The address_3 column now holds something like " IL 62701".
First, trim it to remove the leading space. Then use Extract Text
to pull the state and zip apart.
For the state: use Extract Text with the "before_delimiter" option and set the delimiter to a space. This grabs everything before the first space - "IL".
For the zip: use Extract Text with the "after_delimiter" option and set the delimiter to a space. This grabs everything after the first space - "62701".
The generated SQL uses:
SPLIT_PART(TRIM(address_3), ' ', 1) AS state
SPLIT_PART(TRIM(address_3), ' ', 2) AS zip
The result
After four or five pipeline steps, your single address column is now four clean columns: street, city, state, and zip. Each one is independently filterable, sortable, and groupable.
Want to see how many customers you have in each state? Group by the state column. Need to filter to just Texas zip codes starting with 787? Filter the zip column with "starts with". Need to export city and state for a geocoding API? Select just those columns.
Final result - four clean, independently usable columns from the original single address field:
| street | city | state | zip |
|---|---|---|---|
| 123 Main St | Springfield | IL | 62701 |
| 456 Oak Ave | Chicago | IL | 60601 |
| 789 Pine Rd | Austin | TX | 78701 |
| 42 Elm St | Portland | OR | 97201 |
Each column is now filterable. Group by state to count customers per state; filter zip to target a region.
Handling messy real-world addresses
Not every address follows the neat "street, city, state zip" pattern. Some things to watch for:
- Apartment numbers: "123 Main St Apt 4B, Springfield, IL 62701" - the street part includes the unit. That's usually fine since it stays in the street column.
- Missing commas: "123 Main St Springfield IL 62701" - without commas, splitting by comma won't work. You might need to use Extract Text with position-based extraction or regex.
- Extra fields: "123 Main St, Suite 200, Springfield, IL 62701" - four comma-separated parts instead of three. Set the split to 4 parts and combine the first two for the full street address.
The key is to look at your data first. Scroll through the address column and spot the pattern. Most datasets from a single source follow a consistent format, even if it's not the textbook one.
Why not do this in a spreadsheet?
You absolutely can. Excel has Text to Columns, Google Sheets has SPLIT().
But there are two advantages to doing it here. First, the operation is part of a pipeline -
if you load updated data next month, the same split applies automatically. Second, you can see the
SQL that runs. If you need to hand this off to someone building a proper ETL pipeline,
they have the exact DuckDB query.