December 9, 2025 by Arif Aslam 4 min read

Profiling an Excel Export to Understand Data Quality

A colleague emails you an .xlsx file with the subject line "here's the Q4 data." No documentation, no schema, no explanation of what the columns mean. Before you drop this into a dashboard, a report, or a model, you need to answer some basic questions. How many rows? Are there blanks? Did Excel silently mangle any dates or numbers?

This is data profiling: understanding the shape and quality of a dataset before you do anything with it. Here's how to profile an Excel export in ExploreMyData without installing anything or writing code.

Step 1: Load the Excel file

Drag the .xlsx file into ExploreMyData or use the file picker. DuckDB WASM reads Excel files natively, so you don't need to convert to CSV first. The file stays in your browser - nothing gets uploaded.

Once loaded, you'll see the row count in the status bar and all columns in the data grid. This already tells you something: if you expected 10,000 rows and you see 8,200, there's a gap somewhere.

#	deal_id	account	revenue	close_date	stage	owner
BIGINT VARCHAR VARCHAR DATE VARCHAR VARCHAR - 8,203 rows
1	D-10041	Acme Corp	24,500	2025-10-14	Closed Won	Sarah K.
2	D-10042	GlobalTech	8,900	2025-11-02	Closed Won	Mike R.
3	D-10043	Harbor Finance	$4,200.00	2025-11-19	Closed Lost	Sarah K.

8,203 rows loaded. Note row 3: revenue shows "$4,200.00" (with dollar sign) while other rows are plain numbers - a type issue to address.

Step 2: Check column types

Excel is notorious for storing numbers as text, dates as serial numbers, and zip codes without leading zeros. The first thing to check is whether each column was imported with the right type.

Look at the type badge next to each column name in the header. You'll see labels like VARCHAR, DOUBLE, BIGINT, or TIMESTAMP. If "revenue" shows up as VARCHAR, you've found your first issue.

Step 3: Profile each column with Column Explorer

Click any column header to open the Column Explorer. This is where profiling gets useful. The explorer shows different information depending on the column type:

Numeric columns show min, max, mean, median, standard deviation, and a histogram. Outliers jump out immediately. If the max revenue value is 999,999,999, something is wrong.
Text columns show a frequency table of distinct values. You'll see how many unique values exist, which ones appear most often, and whether there are near-duplicates like "Active" and "active".
Date/time columns show the range (earliest to latest), and you can spot gaps or values that fall outside the expected period. Q4 data with July dates? Flag that.

Stat	Value	What it tells you
Min	500	Smallest deal - plausible
Max	999,999,999	Placeholder value - data entry issue
Mean	18,402	Inflated by the outlier
Std Dev	142,883	Extremely high relative to mean
Nulls	317	3.9% missing - which deals?

Column: revenue. The 999,999,999 max is almost certainly a placeholder meaning "TBD." The histogram would show nearly all values in the first bin, with one bar far to the right.

Step 4: Find missing values

The Column Explorer shows a null count at the top. But if you want to see the actual rows with missing data, use Filter.

Open the toolbar and select Filter.
Choose the column you want to check.
Set the operator to is null.
Apply. The grid now shows only rows where that column is blank.

This is useful for understanding the pattern of missing data. Are the blanks random, or are they concentrated in specific regions, dates, or categories? That distinction matters for how you handle them.

Step 5: Fix what you find

Profiling often reveals type issues that need fixing before analysis. If a numeric column was imported as text (common with Excel exports that have mixed formatting), use Convert Type to cast it.

Select Convert Type from the Transform group.
Choose the column.
Select the target type (e.g., DOUBLE for revenue, DATE for a date string).

ExploreMyData uses TRY_CAST() instead of CAST(), so values that can't convert become NULL rather than throwing an error. After converting, you can filter for those new NULLs to find the specific rows that had bad data.

Before and after converting revenue from VARCHAR to DOUBLE:

deal_id	revenue (VARCHAR - before)	revenue (DOUBLE - after)
D-10041	"24500"	24500.0
D-10042	"8900"	8900.0
D-10043	"$4,200.00"	NULL
D-10044	"TBD"	NULL

TRY_CAST converts cleanly-formatted values and turns bad values (dollar signs, "TBD") into NULL. Now filter for NULL to find every row that needs cleanup.

What a profile tells you

After five minutes of clicking through columns, you know things that would take much longer to discover in a spreadsheet or a script:

Which columns have missing data, and how much
Whether types match what you expect
The distribution of numeric values (including outliers)
How many distinct values exist in categorical columns
The date range of time-based data

This isn't glamorous work, but it's the work that prevents you from building a report on bad data and having to redo everything later. Profile first, analyze second.

Profile your data now →

Profiling an Excel Export to Understand Data Quality

Step 1: Load the Excel file

Step 2: Check column types

Step 3: Profile each column with Column Explorer

Step 4: Find missing values

Step 5: Fix what you find

What a profile tells you

Try it yourself

Related tools

More from the blog

Spotting Columns Where the Data Type is Wrong

5 Ways to Clean Messy Data Without Writing Code

How to Count Unique Values Per Column