Calculating Age from a Date of Birth Column
Your HR export has a date_of_birth column.
Your customer database has birthdate.
Neither of them has an age field. You need ages for demographic analysis, eligibility checks,
or to bucket people into age groups for a report. Every time you open the file, the ages should
reflect today's date, not some hardcoded snapshot from when the data was last updated.
Here's how to calculate a live age column from a birthdate in ExploreMyData, then segment it into age groups.
Make sure the column is a date type
Before calculating anything, the birthdate column needs to actually be a DATE type, not a string. If your CSV imported it as VARCHAR (common with formats like "03/15/1990" or "1990-03-15"), add a Convert Type step first. Select the column, choose DATE as the target type. DuckDB is good at parsing common date formats automatically. If your dates use an unusual format, you might need to clean them with Find & Replace first.
You can check the current type in the column header. If it shows VARCHAR,
convert it. If it already shows DATE, you're good.
Calculating age with Date Diff
Open the toolbar and select Date Diff from the Transform group. Configure it:
- Set the start date column to
date_of_birth. - Set the end date to
CURRENT_DATE(the built-in value for today's date). - Set the unit to year.
- Name the result column
age.
The generated SQL:
DATEDIFF('year', date_of_birth, CURRENT_DATE) AS age
Date Diff - configuration:
Generated SQL: DATEDIFF('year', date_of_birth, CURRENT_DATE) AS age
For a table with employee data, the result looks like:
| name | date_of_birth | age |
|---|---|---|
| Sarah Chen | 1990-03-15 | 36 |
| James Wright | 1985-11-22 | 40 |
| Maria Garcia | 1998-07-04 | 27 |
| David Kim | 2001-01-30 | 25 |
A note on DATEDIFF precision
DuckDB's DATEDIFF('year', ...) counts the
number of year boundaries crossed between two dates. Someone born on December 31, 1990 would
show age 36 on January 1, 2026, even though they're really 35 years and one day old. For most
reporting purposes, this is close enough. If you need exact age accounting (down to the day), you'd
use a custom SQL expression, but for demographic buckets and general analysis, the year-boundary
approach works.
Adding age group segments
Raw ages are useful, but reports usually want age groups: "18-25", "26-35", "36-45", and so on. Use the Add Column operation with the Condition Builder to create these buckets.
- Select Add Column from the Columns group.
- Name the new column
age_group. - Switch to the Condition Builder mode.
- Add conditions:
- If
age< 18, set value to "Under 18" - If
agebetween 18 and 25, set value to "18-25" - If
agebetween 26 and 35, set value to "26-35" - If
agebetween 36 and 45, set value to "36-45" - If
agebetween 46 and 55, set value to "46-55" - If
agebetween 56 and 65, set value to "56-65" - Otherwise, set value to "65+"
- If
Under the hood, this generates a CASE WHEN expression:
CASE WHEN age < 18 THEN 'Under 18' WHEN age BETWEEN 18 AND 25 THEN '18-25' ... ELSE '65+' END AS age_group
| Condition | age_group value |
|---|---|
| age IS NULL | Unknown |
| age < 18 | Under 18 |
| age BETWEEN 18 AND 25 | 18-25 |
| age BETWEEN 26 AND 35 | 26-35 |
| age BETWEEN 36 AND 45 | 36-45 |
| age BETWEEN 46 AND 65 | 46-65 |
| Default (all other rows) | 65+ |
Condition Builder generates a CASE WHEN expression. NULL ages are handled first so they don't fall through to the "65+" bucket.
The result:
| name | age | age_group |
|---|---|---|
| Sarah Chen | 36 | 36-45 |
| James Wright | 40 | 36-45 |
| Maria Garcia | 27 | 26-35 |
| David Kim | 25 | 18-25 |
Handling NULL birthdates
If some rows have a NULL date_of_birth,
the DATEDIFF will return NULL for
the age, and the CASE WHEN will fall through to the ELSE clause (or return NULL if there's no ELSE).
You might want to add an explicit first condition: if
age IS NULL, set the age group
to "Unknown". That way NULLs don't silently end up in your "65+" bucket.
The age updates every time you open the file
Because the calculation uses CURRENT_DATE,
the age column recalculates every time you load the data. Open the same file in January and again in
June, and people who had birthdays in between will show an incremented age. No manual updates needed.
This is one of the advantages of computing derived columns through a pipeline rather than hardcoding
values into the source file.