Profiling an Excel Export to Understand Data Quality
A colleague emails you an .xlsx file with the subject line "here's the Q4 data." No documentation, no schema, no explanation of what the columns mean. Before you drop this into a dashboard, a report, or a model, you need to answer some basic questions. How many rows? Are there blanks? Did Excel silently mangle any dates or numbers?
This is data profiling: understanding the shape and quality of a dataset before you do anything with it. Here's how to profile an Excel export in ExploreMyData without installing anything or writing code.
Step 1: Load the Excel file
Drag the .xlsx file into ExploreMyData or use the file picker. DuckDB WASM reads Excel files natively, so you don't need to convert to CSV first. The file stays in your browser - nothing gets uploaded.
Once loaded, you'll see the row count in the status bar and all columns in the data grid. This already tells you something: if you expected 10,000 rows and you see 8,200, there's a gap somewhere.
| # | deal_id | account | revenue | close_date | stage | owner |
|---|---|---|---|---|---|---|
| BIGINT VARCHAR VARCHAR DATE VARCHAR VARCHAR - 8,203 rows | ||||||
| 1 | D-10041 | Acme Corp | 24,500 | 2025-10-14 | Closed Won | Sarah K. |
| 2 | D-10042 | GlobalTech | 8,900 | 2025-11-02 | Closed Won | Mike R. |
| 3 | D-10043 | Harbor Finance | $4,200.00 | 2025-11-19 | Closed Lost | Sarah K. |
8,203 rows loaded. Note row 3: revenue shows "$4,200.00" (with dollar sign) while other rows are plain numbers - a type issue to address.
Step 2: Check column types
Excel is notorious for storing numbers as text, dates as serial numbers, and zip codes without leading zeros. The first thing to check is whether each column was imported with the right type.
Look at the type badge next to each column name in the header. You'll see labels like
VARCHAR,
DOUBLE,
BIGINT, or
TIMESTAMP.
If "revenue" shows up as VARCHAR, you've found your first issue.
Step 3: Profile each column with Column Explorer
Click any column header to open the Column Explorer. This is where profiling gets useful. The explorer shows different information depending on the column type:
- Numeric columns show min, max, mean, median, standard deviation, and a histogram. Outliers jump out immediately. If the max revenue value is 999,999,999, something is wrong.
- Text columns show a frequency table of distinct values. You'll see how many unique values exist, which ones appear most often, and whether there are near-duplicates like "Active" and "active".
- Date/time columns show the range (earliest to latest), and you can spot gaps or values that fall outside the expected period. Q4 data with July dates? Flag that.
| Stat | Value | What it tells you |
|---|---|---|
| Min | 500 | Smallest deal - plausible |
| Max | 999,999,999 | Placeholder value - data entry issue |
| Mean | 18,402 | Inflated by the outlier |
| Std Dev | 142,883 | Extremely high relative to mean |
| Nulls | 317 | 3.9% missing - which deals? |
Column: revenue. The 999,999,999 max is almost certainly a placeholder meaning "TBD." The histogram would show nearly all values in the first bin, with one bar far to the right.
Step 4: Find missing values
The Column Explorer shows a null count at the top. But if you want to see the actual rows with missing data, use Filter.
- Open the toolbar and select Filter.
- Choose the column you want to check.
- Set the operator to is null.
- Apply. The grid now shows only rows where that column is blank.
This is useful for understanding the pattern of missing data. Are the blanks random, or are they concentrated in specific regions, dates, or categories? That distinction matters for how you handle them.
Step 5: Fix what you find
Profiling often reveals type issues that need fixing before analysis. If a numeric column was imported as text (common with Excel exports that have mixed formatting), use Convert Type to cast it.
- Select Convert Type from the Transform group.
- Choose the column.
- Select the target type (e.g., DOUBLE for revenue, DATE for a date string).
ExploreMyData uses TRY_CAST()
instead of CAST(), so values
that can't convert become NULL rather than throwing an error. After converting, you can filter for
those new NULLs to find the specific rows that had bad data.
Before and after converting revenue from VARCHAR to DOUBLE:
| deal_id | revenue (VARCHAR - before) | revenue (DOUBLE - after) |
|---|---|---|
| D-10041 | "24500" | 24500.0 |
| D-10042 | "8900" | 8900.0 |
| D-10043 | "$4,200.00" | NULL |
| D-10044 | "TBD" | NULL |
TRY_CAST converts cleanly-formatted values and turns bad values (dollar signs, "TBD") into NULL. Now filter for NULL to find every row that needs cleanup.
What a profile tells you
After five minutes of clicking through columns, you know things that would take much longer to discover in a spreadsheet or a script:
- Which columns have missing data, and how much
- Whether types match what you expect
- The distribution of numeric values (including outliers)
- How many distinct values exist in categorical columns
- The date range of time-based data
This isn't glamorous work, but it's the work that prevents you from building a report on bad data and having to redo everything later. Profile first, analyze second.