Spotting Columns Where the Data Type is Wrong
You sort a "price" column and get 100, 20, 3, 5, 99 instead of 3, 5, 20, 99, 100. You try to calculate an average and get an error. You filter for dates greater than January 1st and get nonsensical results. The problem is the same every time: the column's data type doesn't match its actual content.
Numbers stored as text sort alphabetically, not numerically. Dates stored as strings can't be compared chronologically. Booleans stored as integers (0 and 1) won't behave like true/false in filters. These type mismatches are silent - they don't throw errors, they just produce wrong results. Here's how to find and fix them in ExploreMyData.
How type issues happen
CSV files don't have types. Every value is text, and the tool reading the file has to guess what each column should be. Most of the time the guessing works. But it fails when:
- A numeric column has a few non-numeric values (like "N/A" or "TBD"), so the whole column stays as text
- A date column uses an unusual format the parser doesn't recognize
- An Excel export stored zip codes without leading zeros, then they got re-imported as integers
- A boolean column uses "Yes"/"No" or "1"/"0" instead of true/false
- Currency values include dollar signs or commas ("$1,234.56"), preventing numeric detection
The frustrating part is that the data looks right when you glance at it. The price column shows numbers. The date column shows dates. It's only when you sort, filter, or calculate that the type mismatch surfaces.
Check 1: Column type badges
The fastest way to spot type issues is to look at the type badges in the column headers. Every
column shows its DuckDB type:
VARCHAR,
BIGINT,
DOUBLE,
DATE,
TIMESTAMP,
BOOLEAN.
Scan across your columns and ask: does the type match what I expect? A column called "price" should be DOUBLE or BIGINT, not VARCHAR. A column called "order_date" should be DATE or TIMESTAMP, not VARCHAR. A column called "is_active" should be BOOLEAN, not BIGINT.
| Column | Detected type | Expected type | Issue |
|---|---|---|---|
| order_id | BIGINT | BIGINT | None |
| price | VARCHAR | DOUBLE | Dollar signs in some rows |
| order_date | VARCHAR | DATE | Mixed formats (MM/DD/YYYY & ISO) |
| status | VARCHAR | VARCHAR | None |
| is_refund | BIGINT | BOOLEAN | Stored as 0 / 1 |
Scanning type badges takes 30 seconds. Three mismatched columns visible immediately: price (VARCHAR), order_date (VARCHAR), is_refund (BIGINT).
Check 2: Column Explorer behavior
Click a column header to open the Column Explorer. The type of explorer that appears tells you how DuckDB is treating the column:
- Numeric explorer (histogram, min/max/mean/median) appears for numeric types. If you expected numbers but see a frequency table instead, the column is text.
- Categorical explorer (frequency table, distinct count) appears for text columns. If you open a "price" column and see a frequency table listing "10.99" as a text value appearing 47 times, it's VARCHAR.
- Date explorer (date range) appears for date/timestamp types. If your date column shows a categorical frequency table, it's stored as text.
This is a reliable signal. You don't need to check the type badge - just open the Column Explorer and see which view you get.
Column Explorer opened on price (VARCHAR) - shows a frequency table instead of a histogram:
| price (as text) | Count |
|---|---|
| 19.99 | 3,842 |
| 49.99 | 2,711 |
| 99.99 | 1,940 |
| $29.99 | 247 |
| N/A | 83 |
| $19.99 | 62 |
A frequency table instead of a histogram confirms this is text, not numeric. The culprits are visible: "$29.99", "$19.99" (dollar signs), and "N/A" prevented automatic numeric detection.
Fix: Convert the type
Once you've identified a mistyped column, fix it with Convert Type.
- Open the toolbar and select Convert Type from the Transform group.
- Choose the column (e.g., "price").
- Select the target type (e.g., DOUBLE for prices, DATE for dates, BOOLEAN for flags).
- Click Apply.
ExploreMyData uses TRY_CAST()
rather than CAST(). The
difference matters: CAST()
throws an error if any value can't convert.
TRY_CAST() converts what
it can and sets the rest to NULL. This is almost always what you want, because it lets you find
the problem values instead of being blocked by them.
Find the values that won't convert
After converting, some values may have become NULL - these are the values that couldn't be parsed as the target type. To find them:
- Add a Filter on the converted column.
- Set the operator to is null.
- Apply. The grid shows only rows where the conversion failed.
These are the rows that were preventing the type from being detected correctly in the first place. Common culprits:
- "N/A", "TBD", "-", "null" (text placeholders in numeric columns)
- "$1,234.56" (currency formatting that needs to be stripped first)
- "03/15/2025" when the parser expected "2025-03-15" (date format mismatch)
- "TRUE" in a column where most values are "1" and "0"
After converting price to DOUBLE, filter price IS NULL reveals the unconvertible rows:
| order_id | original_price_text | price (DOUBLE) | Reason |
|---|---|---|---|
| 10087 | $29.99 | NULL | Dollar sign |
| 10241 | N/A | NULL | Text placeholder |
| 10388 | $19.99 | NULL | Dollar sign |
| 10504 | TBD | NULL | Text placeholder |
392 rows total. The fix: use Find & Replace to remove "$" and commas, then filter out "N/A" and "TBD" rows before converting.
The cleanup workflow
For columns with a handful of bad values, the workflow is straightforward:
- Convert Type to the target type (bad values become NULL).
- Filter for NULLs to see what didn't convert.
- Decide: remove those rows, fill the NULLs with a default value, or go back and fix the original values with Find & Replace before converting.
For columns with currency formatting or other systematic issues, fix the formatting first:
- Use Find & Replace to remove "$" and "," characters.
- Then Convert Type to DOUBLE.
Each step is visible in the pipeline. You can see exactly what happened, reorder steps, or remove one if it didn't work. The original data is never modified.
Why this matters more than you think
Type issues are the most common source of subtle data bugs. Your code runs, your report generates, your chart renders. Everything looks fine - except the numbers are wrong because a sort was alphabetical instead of numerical, or an average silently excluded text values. These bugs don't crash; they just give you wrong answers that look plausible.
Checking types should be the first thing you do when you open any dataset. It takes 30 seconds to scan the column badges, and it saves you from hours of debugging later.