← All posts
4 min read

Calculating Age from a Date of Birth Column

Your HR export has a date_of_birth column. Your customer database has birthdate. Neither of them has an age field. You need ages for demographic analysis, eligibility checks, or to bucket people into age groups for a report. Every time you open the file, the ages should reflect today's date, not some hardcoded snapshot from when the data was last updated.

Here's how to calculate a live age column from a birthdate in ExploreMyData, then segment it into age groups.

Make sure the column is a date type

Before calculating anything, the birthdate column needs to actually be a DATE type, not a string. If your CSV imported it as VARCHAR (common with formats like "03/15/1990" or "1990-03-15"), add a Convert Type step first. Select the column, choose DATE as the target type. DuckDB is good at parsing common date formats automatically. If your dates use an unusual format, you might need to clean them with Find & Replace first.

You can check the current type in the column header. If it shows VARCHAR, convert it. If it already shows DATE, you're good.

Calculating age with Date Diff

Open the toolbar and select Date Diff from the Transform group. Configure it:

  1. Set the start date column to date_of_birth.
  2. Set the end date to CURRENT_DATE (the built-in value for today's date).
  3. Set the unit to year.
  4. Name the result column age.

The generated SQL:

DATEDIFF('year', date_of_birth, CURRENT_DATE) AS age

Date Diff - configuration:

Start date column date_of_birth
End date CURRENT_DATE
Unit year

Generated SQL: DATEDIFF('year', date_of_birth, CURRENT_DATE) AS age

For a table with employee data, the result looks like:

name date_of_birth age
Sarah Chen1990-03-1536
James Wright1985-11-2240
Maria Garcia1998-07-0427
David Kim2001-01-3025

A note on DATEDIFF precision

DuckDB's DATEDIFF('year', ...) counts the number of year boundaries crossed between two dates. Someone born on December 31, 1990 would show age 36 on January 1, 2026, even though they're really 35 years and one day old. For most reporting purposes, this is close enough. If you need exact age accounting (down to the day), you'd use a custom SQL expression, but for demographic buckets and general analysis, the year-boundary approach works.

Adding age group segments

Raw ages are useful, but reports usually want age groups: "18-25", "26-35", "36-45", and so on. Use the Add Column operation with the Condition Builder to create these buckets.

  1. Select Add Column from the Columns group.
  2. Name the new column age_group.
  3. Switch to the Condition Builder mode.
  4. Add conditions:
    • If age < 18, set value to "Under 18"
    • If age between 18 and 25, set value to "18-25"
    • If age between 26 and 35, set value to "26-35"
    • If age between 36 and 45, set value to "36-45"
    • If age between 46 and 55, set value to "46-55"
    • If age between 56 and 65, set value to "56-65"
    • Otherwise, set value to "65+"

Under the hood, this generates a CASE WHEN expression:

CASE WHEN age < 18 THEN 'Under 18' WHEN age BETWEEN 18 AND 25 THEN '18-25' ... ELSE '65+' END AS age_group

Condition age_group value
age IS NULLUnknown
age < 18Under 18
age BETWEEN 18 AND 2518-25
age BETWEEN 26 AND 3526-35
age BETWEEN 36 AND 4536-45
age BETWEEN 46 AND 6546-65
Default (all other rows)65+

Condition Builder generates a CASE WHEN expression. NULL ages are handled first so they don't fall through to the "65+" bucket.

The result:

name age age_group
Sarah Chen3636-45
James Wright4036-45
Maria Garcia2726-35
David Kim2518-25

Handling NULL birthdates

If some rows have a NULL date_of_birth, the DATEDIFF will return NULL for the age, and the CASE WHEN will fall through to the ELSE clause (or return NULL if there's no ELSE). You might want to add an explicit first condition: if age IS NULL, set the age group to "Unknown". That way NULLs don't silently end up in your "65+" bucket.

The age updates every time you open the file

Because the calculation uses CURRENT_DATE, the age column recalculates every time you load the data. Open the same file in January and again in June, and people who had birthdays in between will show an incremented age. No manual updates needed. This is one of the advantages of computing derived columns through a pipeline rather than hardcoding values into the source file.

Calculate ages in ExploreMyData →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData