Understanding Category Distributions in Your Data
Your orders table has a "category" column. You assume sales are spread fairly evenly across categories. But are they? Maybe "Electronics" accounts for 60% of all orders and "Garden Supplies" is less than 1%. Maybe there are 47 categories when you expected 12, because someone entered "electronics", "Electronics", and "ELECTRONICS" as three different things.
You won't know until you look at the distribution. And scrolling through rows won't show you the distribution. You need counts by category, sorted, all in one place.
Open the categorical explorer
In ExploreMyData, click any text column header to open the Column Explorer. For categorical columns, this shows every unique value alongside its count, sorted by frequency.
This single view answers the three most common questions about categorical data:
- How many distinct categories exist?
- Which ones dominate?
- Are there values that shouldn't be there?
| category | Count | Share |
|---|---|---|
| Electronics | 24,840 | 59.1% |
| Clothing | 9,203 | 21.9% |
| Home & Kitchen | 4,911 | 11.7% |
| Books | 2,187 | 5.2% |
| Sports | 611 | 1.5% |
| Garden Supplies | 248 | 0.6% |
6 distinct values, sorted by frequency. Electronics dominates at 59% - good to know before building any category-level report.
Spot the long tail
Most categorical columns follow a power law: a few values account for the bulk of the data, and a long tail of rare values makes up the rest. The explorer makes this immediately visible because values are sorted by count.
Consider a "country" column in a customer table. You might see: United States (84,201), United Kingdom (12,433), Canada (9,871), Germany (4,502)... and then 60 more countries with counts under 100. That long tail matters. If you're building a report grouped by country, those 60 countries will each get their own row, making the report unusable. You might want to group them into an "Other" bucket.
The explorer shows you this problem before you waste time building the report.
Find dirty data hiding in categories
Categorical columns are magnets for data quality issues. Common problems that jump out in the explorer:
-
Inconsistent casing:
Shipped (14,201)andshipped (83)side by side. That's one category, not two. -
Leading/trailing whitespace:
New YorkandNew York(with a space before it). Hard to see in a spreadsheet, obvious in the explorer because they show up as separate entries. -
Abbreviation mix:
California,CA, andCalif.are the same state but three different values. - Unexpected values: A "payment_method" column with "Credit Card", "Debit Card", "Wire Transfer", and "asdf". That last one is probably a test row.
The search bar helps here. Type "new" and you'll see all the New York variants at once. Type "ship" and you'll catch "Shipped", "shipped", "SHIPPED", and "Ship" in one view.
Search results for "ship" in the status column:
| status (matched) | Count |
|---|---|
| Shipped | 14,201 |
| shipped | 83 |
| SHIPPED | 12 |
| Ship | 5 |
Four variants of the same status - all lowercase, uppercase, and mixed. Searching for "ship" reveals them all at once, which scrolling through rows never would.
Click to filter, then investigate
When you see a suspicious value in the explorer, click it. The grid immediately filters to show only rows with that value. Now you can see the full context: when were these rows created? By whom? What do the other columns look like?
This is the fastest path from "that looks weird" to "I understand what happened." A category with 3 records out of 50,000 might be test data, a migration artifact, or a legitimate edge case. The surrounding columns tell you which.
Get a formal breakdown with Top/Bottom
The Column Explorer gives you a quick, visual distribution. If you need a more structured view, use the Top/Bottom operation to show the top N categories by count. This gives you a clean, sorted view in the main grid.
For a full summary table, use Pivot with COUNT as the aggregation. Set the category column as the row grouping. This produces a table with one row per category and a count column, which you can sort, filter, or export.
| category | count |
|---|---|
| Electronics | 24,840 |
| Clothing | 9,203 |
| Home & Kitchen | 4,911 |
| Books | 2,187 |
| Sports | 611 |
| Garden Supplies | 248 |
Pivot with GROUP BY category, COUNT(*). Clean summary table you can sort, filter, or export directly.
What to look for in any categorical column
Keep a mental checklist when you open the categorical explorer:
- Count of unique values: Does it match expectations? A "status" column with 200 unique values is suspicious. A "product_name" column with 3 unique values might mean data is filtered already.
- Dominant value: If one category has 95% of the data, any analysis grouped by this column will be skewed. That's not wrong, but you should know it going in.
- Null or blank entries: How many records have no category? Are they evenly distributed or concentrated in a specific time period?
- Near-duplicates: Two values that look almost the same are usually the same, entered differently. Search for partial strings to find them.
Five minutes with the categorical explorer tells you whether a column is clean enough to use for grouping and filtering, or whether it needs cleanup first.