Quick Top-10 Analysis: Finding the Biggest Items in Your Data
"Who are our top 10 customers?" It's the kind of question that comes up in every meeting. Variations include: what are the most common support ticket categories, which products generate the most revenue, which error codes appear most often. Same pattern, different data.
If your data is in a spreadsheet, you sort a column, squint at the first few rows, and hope you didn't miss anything. If it's in a database, you write a query. If it's in a CSV file on your desktop, you need something in between.
The fast way: Top/Bottom
Open your file in ExploreMyData and select the Top/Bottom operation. Pick the column you want to rank by (e.g., "revenue"), choose "Top", and set N to 10.
That's it. You get the 10 rows with the highest revenue, sorted descending. The whole thing takes about five seconds.
Result: Top 10 rows by revenue, sorted descending:
| Rank | customer_name | product | revenue |
|---|---|---|---|
| 1 | Acme Corp | Enterprise License - Annual | 52,400.00 |
| 2 | GlobalTech | Enterprise License - Annual | 48,800.00 |
| 3 | Sunrise Media | Pro Plan - 50 seats | 18,500.00 |
| 4 | Blue Ridge LLC | Pro Plan - 30 seats | 11,100.00 |
| 5 | Harbor Finance | Enterprise License - Annual | 9,750.00 |
Five seconds to get this. The top 2 rows are enterprise annual deals; the rest are multi-seat pro plans. Context visible at a glance.
This works when your data already has the value you want to rank by. Customer-level revenue, product-level sales count, error code frequency - if the number is already in a column, Top/Bottom gives you the answer directly.
When you need to aggregate first
Sometimes the data isn't pre-aggregated. You have an orders table with one row per order, and you need top customers by total spend. The revenue-per-customer number doesn't exist yet - you need to calculate it.
The workflow is:
- Use Group By on customer_id (or customer_name), with SUM of order_amount.
- Apply Top/Bottom on the resulting sum column.
Two steps, no code. The pipeline shows both operations, and you can adjust either one independently - change the aggregation to AVG, or bump the top N from 10 to 20.
After Group By customer_name + SUM(order_amount), then Top 10:
| customer_name | sum(order_amount) |
|---|---|
| Acme Corp | 142,300.00 |
| GlobalTech | 118,050.00 |
| Sunrise Media | 76,400.00 |
| Blue Ridge LLC | 54,900.00 |
| Harbor Finance | 41,250.00 |
Pipeline: Group By → Top 10. Each customer's orders are summed across all transactions. Two operations, no code.
Using Limit for simple cases
If your data is already sorted (or you've just applied a sort), the Limit operation is even simpler. It just takes the first N rows from whatever you're looking at.
Sort by revenue descending, then Limit to 10. Same result as Top/Bottom for straightforward cases. The difference is that Limit doesn't sort for you - it just truncates. So make sure the sort step comes first in your pipeline.
Limit is useful when you want the first N rows after a complex set of filters and sorts, not just the top N by a single column.
Handling ties with RANK
Here's where it gets tricky. Say your top 10 customers by revenue includes two customers tied at position 10 with exactly $45,230. Top/Bottom picks one arbitrarily. If you're presenting this to someone, that's a problem.
Use the Window operation to add a rank column.
Select RANK() as the
function, order by revenue descending, and name the new column
revenue_rank.
RANK() assigns the same
rank to tied values and skips the next number. So two customers tied at 10th get rank 10,
and the next one gets rank 12. This is transparent - anyone looking at the data can see the tie.
Then filter to revenue_rank <= 10.
You might get 11 rows instead of 10, but that's the correct answer when there's a tie.
| customer_name | total_revenue | revenue_rank |
|---|---|---|
| Acme Corp | 142,300.00 | 1 |
| GlobalTech | 118,050.00 | 2 |
| ... | ... | ... |
| Magnolia Retail | 45,230.00 | 10 |
| Northfield Co | 45,230.00 | 10 |
| Delta Systems | 41,100.00 | 12 |
Two customers share rank 10 (both $45,230). RANK() skips 11 and assigns 12 to the next. Filter to revenue_rank <= 10 returns 11 rows - the correct answer.
If you want exactly N rows with no ties, use ROW_NUMBER()
instead of RANK(). It assigns
a unique number to every row, breaking ties arbitrarily. Use this when you need a strict cutoff
and don't care which tied row gets excluded.
Common top-N scenarios
A few patterns you'll run into repeatedly:
- Top customers by revenue: Group By customer, SUM(amount), Top 10. The 80/20 rule usually holds - a small number of customers drive most revenue.
- Most common categories: Group By category, COUNT(*), Top 5. Works for support ticket types, product categories, error codes, or any categorical column.
- Biggest orders: No aggregation needed. Top/Bottom directly on order_amount. Good for spotting outliers or unusual transactions.
- Bottom performers: Same operation, just flip to "Bottom". Lowest-revenue products, least common categories, smallest orders.
- Top N per group: Window with RANK(), partitioned by group column, ordered by value. Then filter to rank <= N. This gives you the top 3 products in each category, for example.
Which approach to choose
Use Top/Bottom when you want a quick answer from a single column. Use Limit after a sort when you've already set up the ordering you want. Use Window + RANK when ties matter or when you need top N within groups.
For the "who are our top 10?" question in a meeting, Top/Bottom gets you there in five seconds. For a report where accuracy and tie-handling matter, take the extra minute to use RANK.