Fixing Dates When Your CSV Has Three Different Formats
You need to filter orders from Q4 2024. You set a date range filter and get 200 rows. But you know there should be closer to 800. You look at the date column and the problem is obvious:
2024-10-15- ISO format10/15/2024- US formatOct 15 2024- written format15-Oct-2024- yet another variation
The column is stored as text, not as a proper date type. The date filter only matched the rows that happened to be in ISO format. The rest were treated as strings and ignored.
This happens when data comes from multiple sources. One system exports ISO dates, another exports US
dates, and someone manually entered dates in a written format. When they're all dumped into one CSV,
you get a column where sorting goes 01/02/2024,
10/15/2024,
2024-01-15,
Apr 3 2024 - alphabetical order, not chronological.
Fixing this requires two things: parse every format into a real date value, then convert the column type. Here's how to do it in ExploreMyData.
Figure out which formats you're dealing with
Before writing any rules, you need to know what you've got. Sort the date column and scroll through it. Look for patterns. In most cases, you'll find 2-3 distinct formats, not 20.
Here are the most common formats in the wild:
2024-01-15- ISO 8601 (YYYY-MM-DD). The good one.01/15/2024- US format (MM/DD/YYYY). The ambiguous one.15/01/2024- European format (DD/MM/YYYY). Looks identical to US for days 1-12.Jan 15, 2024orJan 15 2024- Written format. At least it's unambiguous.15-Jan-2024- Another written variant.
The tricky part is telling US and European formats apart. Is 03/04/2024
March 4th or April 3rd? You'll need to know your data source to decide. If you see a value like
13/04/2024, that's definitely European (there's no 13th month).
Use clues like that.
A single order_date column with four different formats - sorting is alphabetical, not chronological:
| order_id | order_date (raw text) | format detected |
|---|---|---|
| ORD-101 | 2024-01-15 | ISO (YYYY-MM-DD) |
| ORD-102 | 10/15/2024 | US (MM/DD/YYYY) |
| ORD-103 | Oct 15 2024 | Written (%b %d %Y) |
| ORD-104 | 15-Oct-2024 | DD-Mon-YYYY |
| ORD-105 | 01/02/2024 | US (ambiguous - Jan 2 or Feb 1?) |
Date range filters only matched ISO rows. The other 600 rows were silently ignored.
Step 1: Normalize with a CASE WHEN expression
The strategy is to detect each format and parse it with the right pattern. Open Update Values from the Transform group. Select your date column.
You'll write a CASE WHEN expression that checks the structure of each value and applies the correct
strptime pattern. Here's what ExploreMyData generates:
CASE WHEN order_date LIKE '____-__-__' THEN strptime(order_date, '%Y-%m-%d') WHEN order_date LIKE '__/__/____' THEN strptime(order_date, '%m/%d/%Y') WHEN order_date SIMILAR TO '[A-Z][a-z]{2} [0-9]+ [0-9]{4}' THEN strptime(order_date, '%b %d %Y') ELSE TRY_CAST(order_date AS DATE) END
Let's break that down:
- ISO format (
____-__-__): four characters, dash, two, dash, two. Parsed with%Y-%m-%d. - US format (
__/__/____): two digits, slash, two digits, slash, four digits. Parsed with%m/%d/%Y. - Written format (starts with letters): matched with a pattern and parsed with
%b %d %Y, where%bmatches abbreviated month names like "Jan", "Feb". - Fallback: anything that doesn't match gets a
TRY_CAST, which returns NULL instead of throwing an error.
strptime is DuckDB's function for parsing a string into a date
using a format pattern. The %Y means four-digit year,
%m means two-digit month,
%d means two-digit day, and
%b means abbreviated month name.
CASE WHEN expression applied - each format branch produces a parsed date value:
| order_date (input) | LIKE pattern matched | strptime format | result |
|---|---|---|---|
| 2024-01-15 | ____-__-__ | %Y-%m-%d | 2024-01-15 |
| 10/15/2024 | __/__/____ | %m/%d/%Y | 2024-10-15 |
| Oct 15 2024 | letter pattern | %b %d %Y | 2024-10-15 |
| 15-Oct-2024 | digit-letter pattern | %d-%b-%Y | 2024-10-15 |
| TBD | no match | TRY_CAST fallback | NULL |
Step 2: Convert the column type to DATE
After the Update Values step, every row has a parsed date value. But the column itself might still be typed as VARCHAR. To make it a proper DATE column that supports sorting and filtering, use Convert Type from the Transform group.
Select the date column and choose DATE as the target type. ExploreMyData generates:
TRY_CAST(order_date AS DATE)
Now the column is a real date type. Sorting works chronologically. Date range filters work correctly. You can extract year, month, or quarter. And if any values couldn't be parsed, they show up as NULL instead of crashing the query.
Handling the DD-Mon-YYYY variant
If you also have dates like 15-Jan-2024, add another WHEN clause
to the CASE expression. The pattern for this format is %d-%b-%Y.
You can detect it by checking for a dash after the first two characters and a letter in position 4:
WHEN order_date SIMILAR TO '[0-9]{2}-[A-Z][a-z]{2}-[0-9]{4}' THEN strptime(order_date, '%d-%b-%Y')
Add as many format branches as your data needs. Most files have 2-3 formats. More than 4 is rare.
Check for NULLs after conversion
After converting, sort the date column to push NULLs to the top or bottom. If you see NULLs where you expected dates, those are values that didn't match any of your patterns. Go back to the original data, find those rows, and figure out what format they're in.
Common culprits: dates with commas like Jan 15, 2024 (note the comma),
dates with extra spaces, or completely invalid values like N/A or
TBD that someone typed into the date field. For the comma variant,
add a %b %d, %Y pattern. For the invalid values, those NULLs are
actually the correct result - they were never real dates.
After Update Values + Convert Type, the column sorts chronologically and date range filters work correctly:
| order_id | order_date (before) | order_date (after, sorted) |
|---|---|---|
| ORD-101 | 2024-01-15 | 2024-01-15 |
| ORD-102 | Apr 3 2024 | 2024-04-03 |
| ORD-103 | 10/15/2024 | 2024-10-15 |
| ORD-104 | 15-Oct-2024 | 2024-10-15 |
| ORD-105 | 01/02/2024 | 2024-12-05 |
Column type changed from VARCHAR to DATE. Filtering Q4 2024 now returns all 800 expected rows.
Why this matters beyond sorting
Once dates are proper DATE values, everything else unlocks. You can group by month or quarter to spot trends. You can calculate the difference between two dates. You can filter to "last 90 days" with a range filter instead of manually typing date strings.
The root cause is usually that someone exported data from Excel, where dates look fine on screen but get mangled in CSV format. Or two teams use different date conventions and nobody noticed until the data was merged. Either way, the fix is the same: detect the formats, parse each one, convert the type. Three steps, and your dates actually work.