← All posts
4 min read

Finding the Nth Largest or Smallest Values

"What's the 3rd highest revenue order?" "What's the 5th cheapest product?" "What salary sits at the 10th position from the top?" These are the questions that MAX and MIN can't answer. You don't want the extreme value. You want a specific rank.

In SQL, this typically means writing a subquery with ORDER BY and OFFSET, or reaching for window functions. In a spreadsheet, you'd use LARGE() or SMALL() and hope you got the syntax right. In ExploreMyData, it's two clicks.

The Smallest and Largest operations

ExploreMyData has two dedicated operations for Nth-value lookups: Smallest and Largest. Both work the same way. Pick a numeric column, specify N, and the operation adds a new column containing the Nth value.

The result is a constant column -- every row gets the same value. That might seem odd at first, but it's deliberate. It gives you a reference threshold you can use in subsequent pipeline steps.

Example: Find the 3rd cheapest product

Suppose you have a product catalog with a price column. You want to know what the 3rd lowest price is.

  1. Open the toolbar and select Smallest.
  2. Choose the price column.
  3. Set N to 3.
  4. Click Apply.

A new column appears (e.g., price_smallest_3) with the same value in every row: the 3rd lowest price in the dataset.

product_namepriceprice_smallest_3
USB Hub12.9919.99
Mouse Pad8.4919.99
Webcam Stand19.9919.99
Keyboard49.0019.99
Monitor299.0019.99

Every row gets the same value: the 3rd lowest price ($19.99). The constant column can then be used in a Filter step.

Example: Find the 3rd highest revenue order

Same idea, other direction. You have an orders table with a revenue column and want the 3rd largest value.

  1. Select Largest from the toolbar.
  2. Choose revenue.
  3. Set N to 3.
  4. Apply.

Every row now has a revenue_largest_3 column showing the 3rd highest revenue value.

What SQL is generated

Under the hood, ExploreMyData uses a scalar subquery:

(SELECT price FROM source ORDER BY price ASC LIMIT 1 OFFSET 2) AS price_smallest_3

For Largest, it's the same pattern but with DESC ordering. The OFFSET is N-1 because SQL offsets are zero-based. So N=3 means OFFSET 2 -- skip the first two, take the third.

Because this is a scalar subquery, it evaluates once and returns a single value. That's why every row gets the same number. The query engine doesn't re-run it per row.

Combining with Filter for real analysis

A constant column on its own is mildly interesting. It gets powerful when you chain it with a Filter step.

Say you've added revenue_largest_10 (the 10th highest revenue). Now add a Filter: revenue >= revenue_largest_10. You've just isolated your top 10 revenue orders without hardcoding a threshold.

This is better than manually checking the value and typing it into a filter, because the threshold updates automatically when your data changes. Load a new month's data, and the pipeline recalculates.

Pipeline - isolate top 10 revenue orders

1 - Largest applied

Column: revenue - N: 10 - Output: revenue_largest_10

(SELECT revenue FROM source ORDER BY revenue DESC LIMIT 1 OFFSET 9) AS revenue_largest_10

2 - Filter applied

Keep rows where revenue >= revenue_largest_10

WHERE revenue >= revenue_largest_10

Result: only the top 10 revenue orders remain. The threshold auto-updates when the data changes.

Practical use cases

Benchmarking. "Our 10th largest order is $4,200. That's our top-10 threshold." Report this number to stakeholders as a performance benchmark.

Median approximation. If you have 100 rows and set N=50 on Smallest, you get a rough median. It's not statistically precise for even-numbered datasets, but it's fast and often close enough for exploratory work.

Threshold setting. "Flag every employee whose salary is below the 5th lowest." Use Smallest with N=5, then Filter where salary <= salary_smallest_5.

Outlier boundaries. Find the 3rd largest and 3rd smallest values in a dataset, then filter to rows between those bounds. Quick way to trim extremes without computing percentiles.

Things to watch out for

Ties. If multiple rows share the same value at position N, the operation returns that value. But filtering with >= or <= will include all tied rows. You might get 12 rows from a "top 10" filter if positions 9-12 share the same value.

NULLs. NULL values are sorted last in ascending order and first in descending order by DuckDB. If your column has NULLs, the Largest operation might return NULL for small values of N. Clean NULLs first with Fill Missing or a Filter that removes them.

N exceeds row count. If you ask for the 500th smallest value in a 200-row dataset, the result will be NULL. No error, just NULL.

Try Smallest and Largest on your data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData