← All posts
4 min read

Quick Top-10 Analysis: Finding the Biggest Items in Your Data

"Who are our top 10 customers?" It's the kind of question that comes up in every meeting. Variations include: what are the most common support ticket categories, which products generate the most revenue, which error codes appear most often. Same pattern, different data.

If your data is in a spreadsheet, you sort a column, squint at the first few rows, and hope you didn't miss anything. If it's in a database, you write a query. If it's in a CSV file on your desktop, you need something in between.

The fast way: Top/Bottom

Open your file in ExploreMyData and select the Top/Bottom operation. Pick the column you want to rank by (e.g., "revenue"), choose "Top", and set N to 10.

That's it. You get the 10 rows with the highest revenue, sorted descending. The whole thing takes about five seconds.

Result: Top 10 rows by revenue, sorted descending:

Rankcustomer_nameproductrevenue
1Acme CorpEnterprise License - Annual52,400.00
2GlobalTechEnterprise License - Annual48,800.00
3Sunrise MediaPro Plan - 50 seats18,500.00
4Blue Ridge LLCPro Plan - 30 seats11,100.00
5Harbor FinanceEnterprise License - Annual9,750.00

Five seconds to get this. The top 2 rows are enterprise annual deals; the rest are multi-seat pro plans. Context visible at a glance.

This works when your data already has the value you want to rank by. Customer-level revenue, product-level sales count, error code frequency - if the number is already in a column, Top/Bottom gives you the answer directly.

When you need to aggregate first

Sometimes the data isn't pre-aggregated. You have an orders table with one row per order, and you need top customers by total spend. The revenue-per-customer number doesn't exist yet - you need to calculate it.

The workflow is:

  1. Use Group By on customer_id (or customer_name), with SUM of order_amount.
  2. Apply Top/Bottom on the resulting sum column.

Two steps, no code. The pipeline shows both operations, and you can adjust either one independently - change the aggregation to AVG, or bump the top N from 10 to 20.

After Group By customer_name + SUM(order_amount), then Top 10:

customer_namesum(order_amount)
Acme Corp142,300.00
GlobalTech118,050.00
Sunrise Media76,400.00
Blue Ridge LLC54,900.00
Harbor Finance41,250.00

Pipeline: Group By → Top 10. Each customer's orders are summed across all transactions. Two operations, no code.

Using Limit for simple cases

If your data is already sorted (or you've just applied a sort), the Limit operation is even simpler. It just takes the first N rows from whatever you're looking at.

Sort by revenue descending, then Limit to 10. Same result as Top/Bottom for straightforward cases. The difference is that Limit doesn't sort for you - it just truncates. So make sure the sort step comes first in your pipeline.

Limit is useful when you want the first N rows after a complex set of filters and sorts, not just the top N by a single column.

Handling ties with RANK

Here's where it gets tricky. Say your top 10 customers by revenue includes two customers tied at position 10 with exactly $45,230. Top/Bottom picks one arbitrarily. If you're presenting this to someone, that's a problem.

Use the Window operation to add a rank column. Select RANK() as the function, order by revenue descending, and name the new column revenue_rank.

RANK() assigns the same rank to tied values and skips the next number. So two customers tied at 10th get rank 10, and the next one gets rank 12. This is transparent - anyone looking at the data can see the tie.

Then filter to revenue_rank <= 10. You might get 11 rows instead of 10, but that's the correct answer when there's a tie.

customer_nametotal_revenuerevenue_rank
Acme Corp142,300.001
GlobalTech118,050.002
.........
Magnolia Retail45,230.0010
Northfield Co45,230.0010
Delta Systems41,100.0012

Two customers share rank 10 (both $45,230). RANK() skips 11 and assigns 12 to the next. Filter to revenue_rank <= 10 returns 11 rows - the correct answer.

If you want exactly N rows with no ties, use ROW_NUMBER() instead of RANK(). It assigns a unique number to every row, breaking ties arbitrarily. Use this when you need a strict cutoff and don't care which tied row gets excluded.

Common top-N scenarios

A few patterns you'll run into repeatedly:

  • Top customers by revenue: Group By customer, SUM(amount), Top 10. The 80/20 rule usually holds - a small number of customers drive most revenue.
  • Most common categories: Group By category, COUNT(*), Top 5. Works for support ticket types, product categories, error codes, or any categorical column.
  • Biggest orders: No aggregation needed. Top/Bottom directly on order_amount. Good for spotting outliers or unusual transactions.
  • Bottom performers: Same operation, just flip to "Bottom". Lowest-revenue products, least common categories, smallest orders.
  • Top N per group: Window with RANK(), partitioned by group column, ordered by value. Then filter to rank <= N. This gives you the top 3 products in each category, for example.

Which approach to choose

Use Top/Bottom when you want a quick answer from a single column. Use Limit after a sort when you've already set up the ordering you want. Use Window + RANK when ties matter or when you need top N within groups.

For the "who are our top 10?" question in a meeting, Top/Bottom gets you there in five seconds. For a report where accuracy and tie-handling matter, take the extra minute to use RANK.

Find your top items now →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData