December 17, 2025 by Arif Aslam 5 min read

Spotting Columns Where the Data Type is Wrong

You sort a "price" column and get 100, 20, 3, 5, 99 instead of 3, 5, 20, 99, 100. You try to calculate an average and get an error. You filter for dates greater than January 1st and get nonsensical results. The problem is the same every time: the column's data type doesn't match its actual content.

Numbers stored as text sort alphabetically, not numerically. Dates stored as strings can't be compared chronologically. Booleans stored as integers (0 and 1) won't behave like true/false in filters. These type mismatches are silent - they don't throw errors, they just produce wrong results. Here's how to find and fix them in ExploreMyData.

How type issues happen

CSV files don't have types. Every value is text, and the tool reading the file has to guess what each column should be. Most of the time the guessing works. But it fails when:

A numeric column has a few non-numeric values (like "N/A" or "TBD"), so the whole column stays as text
A date column uses an unusual format the parser doesn't recognize
An Excel export stored zip codes without leading zeros, then they got re-imported as integers
A boolean column uses "Yes"/"No" or "1"/"0" instead of true/false
Currency values include dollar signs or commas ("$1,234.56"), preventing numeric detection

The frustrating part is that the data looks right when you glance at it. The price column shows numbers. The date column shows dates. It's only when you sort, filter, or calculate that the type mismatch surfaces.

Check 1: Column type badges

The fastest way to spot type issues is to look at the type badges in the column headers. Every column shows its DuckDB type: VARCHAR, BIGINT, DOUBLE, DATE, TIMESTAMP, BOOLEAN.

Scan across your columns and ask: does the type match what I expect? A column called "price" should be DOUBLE or BIGINT, not VARCHAR. A column called "order_date" should be DATE or TIMESTAMP, not VARCHAR. A column called "is_active" should be BOOLEAN, not BIGINT.

Column	Detected type	Expected type	Issue
order_id	BIGINT	BIGINT	None
price	VARCHAR	DOUBLE	Dollar signs in some rows
order_date	VARCHAR	DATE	Mixed formats (MM/DD/YYYY & ISO)
status	VARCHAR	VARCHAR	None
is_refund	BIGINT	BOOLEAN	Stored as 0 / 1

Scanning type badges takes 30 seconds. Three mismatched columns visible immediately: price (VARCHAR), order_date (VARCHAR), is_refund (BIGINT).

Check 2: Column Explorer behavior

Click a column header to open the Column Explorer. The type of explorer that appears tells you how DuckDB is treating the column:

Numeric explorer (histogram, min/max/mean/median) appears for numeric types. If you expected numbers but see a frequency table instead, the column is text.
Categorical explorer (frequency table, distinct count) appears for text columns. If you open a "price" column and see a frequency table listing "10.99" as a text value appearing 47 times, it's VARCHAR.
Date explorer (date range) appears for date/timestamp types. If your date column shows a categorical frequency table, it's stored as text.

This is a reliable signal. You don't need to check the type badge - just open the Column Explorer and see which view you get.

Column Explorer opened on price (VARCHAR) - shows a frequency table instead of a histogram:

price (as text)	Count
19.99	3,842
49.99	2,711
99.99	1,940
$29.99	247
N/A	83
$19.99	62

A frequency table instead of a histogram confirms this is text, not numeric. The culprits are visible: "$29.99", "$19.99" (dollar signs), and "N/A" prevented automatic numeric detection.

Fix: Convert the type

Once you've identified a mistyped column, fix it with Convert Type.

Open the toolbar and select Convert Type from the Transform group.
Choose the column (e.g., "price").
Select the target type (e.g., DOUBLE for prices, DATE for dates, BOOLEAN for flags).
Click Apply.

ExploreMyData uses TRY_CAST() rather than CAST(). The difference matters: CAST() throws an error if any value can't convert. TRY_CAST() converts what it can and sets the rest to NULL. This is almost always what you want, because it lets you find the problem values instead of being blocked by them.

Find the values that won't convert

After converting, some values may have become NULL - these are the values that couldn't be parsed as the target type. To find them:

Add a Filter on the converted column.
Set the operator to is null.
Apply. The grid shows only rows where the conversion failed.

These are the rows that were preventing the type from being detected correctly in the first place. Common culprits:

"N/A", "TBD", "-", "null" (text placeholders in numeric columns)
"$1,234.56" (currency formatting that needs to be stripped first)
"03/15/2025" when the parser expected "2025-03-15" (date format mismatch)
"TRUE" in a column where most values are "1" and "0"

After converting price to DOUBLE, filter price IS NULL reveals the unconvertible rows:

order_id	original_price_text	price (DOUBLE)	Reason
10087	$29.99	NULL	Dollar sign
10241	N/A	NULL	Text placeholder
10388	$19.99	NULL	Dollar sign
10504	TBD	NULL	Text placeholder

392 rows total. The fix: use Find & Replace to remove "$" and commas, then filter out "N/A" and "TBD" rows before converting.

The cleanup workflow

For columns with a handful of bad values, the workflow is straightforward:

Convert Type to the target type (bad values become NULL).
Filter for NULLs to see what didn't convert.
Decide: remove those rows, fill the NULLs with a default value, or go back and fix the original values with Find & Replace before converting.

For columns with currency formatting or other systematic issues, fix the formatting first:

Use Find & Replace to remove "$" and "," characters.
Then Convert Type to DOUBLE.

Each step is visible in the pipeline. You can see exactly what happened, reorder steps, or remove one if it didn't work. The original data is never modified.

Why this matters more than you think

Type issues are the most common source of subtle data bugs. Your code runs, your report generates, your chart renders. Everything looks fine - except the numbers are wrong because a sort was alphabetical instead of numerical, or an average silently excluded text values. These bugs don't crash; they just give you wrong answers that look plausible.

Checking types should be the first thing you do when you open any dataset. It takes 30 seconds to scan the column badges, and it saves you from hours of debugging later.

Check your column types now →

Spotting Columns Where the Data Type is Wrong

How type issues happen

Check 1: Column type badges

Check 2: Column Explorer behavior

Fix: Convert the type

Find the values that won't convert

The cleanup workflow

Why this matters more than you think

Try it yourself

Related tools

More from the blog

Profiling an Excel Export to Understand Data Quality

5 Ways to Clean Messy Data Without Writing Code

How to Count Unique Values Per Column