Finding the Nth Largest or Smallest Values
"What's the 3rd highest revenue order?" "What's the 5th cheapest product?" "What salary sits at the 10th position from the top?" These are the questions that MAX and MIN can't answer. You don't want the extreme value. You want a specific rank.
In SQL, this typically means writing a subquery with ORDER BY and OFFSET, or reaching for window functions. In a spreadsheet, you'd use LARGE() or SMALL() and hope you got the syntax right. In ExploreMyData, it's two clicks.
The Smallest and Largest operations
ExploreMyData has two dedicated operations for Nth-value lookups: Smallest and Largest. Both work the same way. Pick a numeric column, specify N, and the operation adds a new column containing the Nth value.
The result is a constant column -- every row gets the same value. That might seem odd at first, but it's deliberate. It gives you a reference threshold you can use in subsequent pipeline steps.
Example: Find the 3rd cheapest product
Suppose you have a product catalog with a price
column. You want to know what the 3rd lowest price is.
- Open the toolbar and select Smallest.
- Choose the
pricecolumn. - Set N to 3.
- Click Apply.
A new column appears (e.g., price_smallest_3)
with the same value in every row: the 3rd lowest price in the dataset.
| product_name | price | price_smallest_3 |
|---|---|---|
| USB Hub | 12.99 | 19.99 |
| Mouse Pad | 8.49 | 19.99 |
| Webcam Stand | 19.99 | 19.99 |
| Keyboard | 49.00 | 19.99 |
| Monitor | 299.00 | 19.99 |
Every row gets the same value: the 3rd lowest price ($19.99). The constant column can then be used in a Filter step.
Example: Find the 3rd highest revenue order
Same idea, other direction. You have an orders table with a
revenue column and want the
3rd largest value.
- Select Largest from the toolbar.
- Choose
revenue. - Set N to 3.
- Apply.
Every row now has a revenue_largest_3
column showing the 3rd highest revenue value.
What SQL is generated
Under the hood, ExploreMyData uses a scalar subquery:
(SELECT price FROM source ORDER BY price ASC LIMIT 1 OFFSET 2) AS price_smallest_3
For Largest, it's the same pattern but with DESC
ordering. The OFFSET is N-1 because SQL offsets are zero-based. So N=3 means
OFFSET 2 --
skip the first two, take the third.
Because this is a scalar subquery, it evaluates once and returns a single value. That's why every row gets the same number. The query engine doesn't re-run it per row.
Combining with Filter for real analysis
A constant column on its own is mildly interesting. It gets powerful when you chain it with a Filter step.
Say you've added revenue_largest_10
(the 10th highest revenue). Now add a Filter:
revenue >= revenue_largest_10.
You've just isolated your top 10 revenue orders without hardcoding a threshold.
This is better than manually checking the value and typing it into a filter, because the threshold updates automatically when your data changes. Load a new month's data, and the pipeline recalculates.
Pipeline - isolate top 10 revenue orders
Column: revenue - N: 10 - Output: revenue_largest_10
(SELECT revenue FROM source ORDER BY revenue DESC LIMIT 1 OFFSET 9) AS revenue_largest_10
Keep rows where revenue >= revenue_largest_10
WHERE revenue >= revenue_largest_10
Result: only the top 10 revenue orders remain. The threshold auto-updates when the data changes.
Practical use cases
Benchmarking. "Our 10th largest order is $4,200. That's our top-10 threshold." Report this number to stakeholders as a performance benchmark.
Median approximation. If you have 100 rows and set N=50 on Smallest, you get a rough median. It's not statistically precise for even-numbered datasets, but it's fast and often close enough for exploratory work.
Threshold setting. "Flag every employee whose salary is
below the 5th lowest." Use Smallest with N=5, then Filter where
salary <= salary_smallest_5.
Outlier boundaries. Find the 3rd largest and 3rd smallest values in a dataset, then filter to rows between those bounds. Quick way to trim extremes without computing percentiles.
Things to watch out for
Ties. If multiple rows share the same value at position N,
the operation returns that value. But filtering with
>= or
<= will include all tied rows.
You might get 12 rows from a "top 10" filter if positions 9-12 share the same value.
NULLs. NULL values are sorted last in ascending order and first in descending order by DuckDB. If your column has NULLs, the Largest operation might return NULL for small values of N. Clean NULLs first with Fill Missing or a Filter that removes them.
N exceeds row count. If you ask for the 500th smallest value in a 200-row dataset, the result will be NULL. No error, just NULL.