← All posts
4 min read

Profiling an Excel Export to Understand Data Quality

A colleague emails you an .xlsx file with the subject line "here's the Q4 data." No documentation, no schema, no explanation of what the columns mean. Before you drop this into a dashboard, a report, or a model, you need to answer some basic questions. How many rows? Are there blanks? Did Excel silently mangle any dates or numbers?

This is data profiling: understanding the shape and quality of a dataset before you do anything with it. Here's how to profile an Excel export in ExploreMyData without installing anything or writing code.

Step 1: Load the Excel file

Drag the .xlsx file into ExploreMyData or use the file picker. DuckDB WASM reads Excel files natively, so you don't need to convert to CSV first. The file stays in your browser - nothing gets uploaded.

Once loaded, you'll see the row count in the status bar and all columns in the data grid. This already tells you something: if you expected 10,000 rows and you see 8,200, there's a gap somewhere.

#deal_idaccountrevenueclose_datestageowner
BIGINT         VARCHAR               VARCHAR         DATE          VARCHAR       VARCHAR       - 8,203 rows
1D-10041Acme Corp24,5002025-10-14Closed WonSarah K.
2D-10042GlobalTech8,9002025-11-02Closed WonMike R.
3D-10043Harbor Finance$4,200.002025-11-19Closed LostSarah K.

8,203 rows loaded. Note row 3: revenue shows "$4,200.00" (with dollar sign) while other rows are plain numbers - a type issue to address.

Step 2: Check column types

Excel is notorious for storing numbers as text, dates as serial numbers, and zip codes without leading zeros. The first thing to check is whether each column was imported with the right type.

Look at the type badge next to each column name in the header. You'll see labels like VARCHAR, DOUBLE, BIGINT, or TIMESTAMP. If "revenue" shows up as VARCHAR, you've found your first issue.

Step 3: Profile each column with Column Explorer

Click any column header to open the Column Explorer. This is where profiling gets useful. The explorer shows different information depending on the column type:

  • Numeric columns show min, max, mean, median, standard deviation, and a histogram. Outliers jump out immediately. If the max revenue value is 999,999,999, something is wrong.
  • Text columns show a frequency table of distinct values. You'll see how many unique values exist, which ones appear most often, and whether there are near-duplicates like "Active" and "active".
  • Date/time columns show the range (earliest to latest), and you can spot gaps or values that fall outside the expected period. Q4 data with July dates? Flag that.
StatValueWhat it tells you
Min500Smallest deal - plausible
Max999,999,999Placeholder value - data entry issue
Mean18,402Inflated by the outlier
Std Dev142,883Extremely high relative to mean
Nulls3173.9% missing - which deals?

Column: revenue. The 999,999,999 max is almost certainly a placeholder meaning "TBD." The histogram would show nearly all values in the first bin, with one bar far to the right.

Step 4: Find missing values

The Column Explorer shows a null count at the top. But if you want to see the actual rows with missing data, use Filter.

  1. Open the toolbar and select Filter.
  2. Choose the column you want to check.
  3. Set the operator to is null.
  4. Apply. The grid now shows only rows where that column is blank.

This is useful for understanding the pattern of missing data. Are the blanks random, or are they concentrated in specific regions, dates, or categories? That distinction matters for how you handle them.

Step 5: Fix what you find

Profiling often reveals type issues that need fixing before analysis. If a numeric column was imported as text (common with Excel exports that have mixed formatting), use Convert Type to cast it.

  1. Select Convert Type from the Transform group.
  2. Choose the column.
  3. Select the target type (e.g., DOUBLE for revenue, DATE for a date string).

ExploreMyData uses TRY_CAST() instead of CAST(), so values that can't convert become NULL rather than throwing an error. After converting, you can filter for those new NULLs to find the specific rows that had bad data.

Before and after converting revenue from VARCHAR to DOUBLE:

deal_idrevenue (VARCHAR - before)revenue (DOUBLE - after)
D-10041"24500"24500.0
D-10042"8900"8900.0
D-10043"$4,200.00"NULL
D-10044"TBD"NULL

TRY_CAST converts cleanly-formatted values and turns bad values (dollar signs, "TBD") into NULL. Now filter for NULL to find every row that needs cleanup.

What a profile tells you

After five minutes of clicking through columns, you know things that would take much longer to discover in a spreadsheet or a script:

  • Which columns have missing data, and how much
  • Whether types match what you expect
  • The distribution of numeric values (including outliers)
  • How many distinct values exist in categorical columns
  • The date range of time-based data

This isn't glamorous work, but it's the work that prevents you from building a report on bad data and having to redo everything later. Profile first, analyze second.

Profile your data now →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData