December 1, 2025 by Arif Aslam 5 min read

What to Do First When You Open a New CSV File

Someone sends you a CSV and says "can you look at this?" You open it and see 50,000 rows and 25 columns. The column names are abbreviations that don't mean much. There's no data dictionary. Where do you even start?

Most people scroll around aimlessly for a few minutes, maybe sort a column or two, then go back to the sender with questions they could have answered themselves. There's a better way. Here's a five-minute routine that gives you a solid understanding of any data file before you start asking questions or doing analysis.

Step 1: Check the basics

Before anything else, look at the numbers in the status bar. How many rows? How many columns? This tells you the scale of what you're dealing with. A 500-row file with 8 columns is a different animal than 500,000 rows with 60 columns.

In ExploreMyData, the row count shows immediately after the file loads. Scroll the grid horizontally to see every column. Read the column headers. Even cryptic ones like amt_due_net or cust_seg_cd start to make sense once you see a few values beneath them.

#	order_id	cust_seg_cd	amt_due_net	order_dt	status	region
BIGINT VARCHAR DOUBLE DATE VARCHAR VARCHAR - 49,832 rows
1	10041	ENT	1,240.00	2025-01-03	Active	West
2	10042	SMB	89.99	2025-01-03	Closed	East
3	10043	MID	430.50	2025-01-04	Pending	Central
4	10044	ENT	5,820.00	2025-01-04	Active	West
5	10045	SMB	212.00	2025-01-05	active	East

Column types are shown below each header. Row 5 has "active" (lowercase) - a casing inconsistency to watch for.

Step 2: Check column types

Each column header shows its detected type: text, number, date, boolean. This matters more than you'd think. A "zip_code" column detected as a number means leading zeros got stripped (00501 became 501). A "price" column detected as text means it has dollar signs or commas that need cleaning before you can do math.

Scan the types quickly. Are dates actually dates, or strings? Are numbers actually numbers? Type mismatches are one of the most common problems in CSV files, and catching them early saves a lot of confusion later.

Step 3: Explore a few text columns

Click on a text column header to open the Column Explorer. For categorical columns, this shows every unique value and how many times it appears. This is where you start to understand what the data actually represents.

Say you open a "status" column and see: Active (31,204), Closed (14,891), Pending (3,712), active (193). Now you know three things: this is probably a customer or account file, most records are active, and there's an inconsistent casing issue to deal with.

The search bar in the categorical explorer is useful when a column has hundreds of unique values. Type a few characters to narrow down results. Click any value to instantly filter the grid to just those rows.

status	Count	Share
Active	31,204	62.6%
Closed	14,891	29.9%
Pending	3,712	7.4%
active	193	0.4%
NULL	47	0.1%

5 distinct values - 4 unique statuses plus NULL. The lowercase "active" (193 rows) is the same category as "Active" entered inconsistently.

Step 4: Check numeric columns for range and distribution

Click on a numeric column to see its stats: min, max, average, sum, standard deviation, and null count. These six numbers tell you a lot.

For example, an "order_total" column with min = -$450, max = $847,000, avg = $312. That negative value is probably a refund. The $847K outlier needs investigation. And the average suggests most orders are modest.

Below the stats, the histogram shows the full distribution. Most numeric data is heavily skewed: a few extreme values with the bulk clustered at one end. The histogram makes that immediately visible. You can adjust the number of bins to get more or less granularity.

Stat	Value	Notes
Min	-450.00	Likely a refund - worth checking
Max	847,000.00	~2,700x the average - investigate
Average	312.40	Typical order size
Std Dev	2,841.17	High relative to mean - outliers present
Nulls	214	0.4% of rows - likely draft orders

Column: amt_due_net - histogram shows 99% of orders clustered between $0–$1,000 with a few extreme values stretching the scale.

Step 5: Check date columns for range and gaps

Date columns get their own explorer with a time-series chart. This answers questions you didn't even know you had. Does the data cover a full year or just three months? Is there a gap in July where no records exist? Did the volume spike in December?

Switch the granularity between day, week, month, and year to see patterns at different scales. Daily granularity shows weekday vs. weekend patterns. Monthly smooths things out and shows seasonal trends.

Step 6: Check for nulls

Every Column Explorer shows the null count, but do a quick mental pass: which columns have missing data, and how much? A column that's 80% null is probably not useful. A column that's 99.9% filled but has 50 nulls might be important to investigate.

The numeric explorer puts the null count right in the stats panel. For text columns, NULL shows up as a value in the categorical list (usually at the bottom). Pay attention to the distinction between NULL and empty string, as they can behave differently in filters and aggregations.

Why this matters

This whole routine takes about five minutes. At the end, you know:

How big the dataset is
What each column represents (roughly)
Whether types are correct
The range and distribution of key numeric fields
The time period the data covers
Where the data quality problems are

That's enough to have an informed conversation with whoever sent you the file, or to start your actual analysis with confidence instead of guesswork.

The alternative is jumping straight into analysis and discovering twenty minutes later that the "revenue" column is text, half the dates are from a test environment, and the status column has six different spellings of "cancelled." Do the five-minute check first. It always pays off.

Try it with your own data →

What to Do First When You Open a New CSV File

Step 1: Check the basics

Step 2: Check column types

Step 3: Explore a few text columns

Step 4: Check numeric columns for range and distribution

Step 5: Check date columns for range and gaps

Step 6: Check for nulls

Why this matters

Try it yourself

Related tools

More from the blog

How to Find Outliers in Numeric Data Without Code

Understanding Category Distributions in Your Data

5 Ways to Clean Messy Data Without Writing Code