Using AI to Generate SQL Queries for Your Data
You know exactly what you want. "Show me orders where the total is above the 90th percentile." "Flag rows where the email domain doesn't match the company name." "Calculate the rolling 7-day average of daily revenue." The logic is clear in your head. The DuckDB syntax to express it is not.
ExploreMyData has an AI generation feature that takes a plain English description and returns the SQL you need. It works inside condition inputs, expression fields, and the SQL operation. You describe what you want, and it writes the query.
How it works
Anywhere the app accepts a SQL condition or expression, there's an AI generation button. Click it, type what you want in natural language, and the app sends your description to the Claude API along with context about your data: the table schema, column names and types, and any previous pipeline steps you've already applied.
That context is what makes it work. The AI doesn't just generate generic SQL. It generates SQL that references your actual column names, respects your data types, and accounts for transformations you've already made. The result slots directly into the input field, ready to preview and apply.
You'll need to configure an API key in Settings. The app supports Claude from Anthropic. The provider and model are configurable.
AI generation - natural language to SQL
You type:
"rows where created_at is in the last 30 days"
AI generates:
created_at >= CURRENT_DATE - INTERVAL '30 days'
Example 1: Generate a WHERE condition
Say you have a sales dataset with columns order_date,
amount,
region, and
status. You want to
filter to orders above the 90th percentile of amount.
In the Filter operation's condition field, click the AI button and type:
"orders where amount is above the 90th percentile"
The AI returns something like:
amount > (SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount) FROM data)
Writing that subquery by hand means knowing that DuckDB uses
PERCENTILE_CONT with the
WITHIN GROUP syntax. The
AI knows. You just describe the intent.
Another common use: "rows where created_at is in the last 30 days" generates
created_at >= CURRENT_DATE - INTERVAL '30 days'.
Much faster than looking up DuckDB's date arithmetic syntax.
Example 2: Generate an expression for Add Column
The Add Column operation lets you create a new column from a SQL expression. This is where the AI generation really shines, because expressions can get complicated.
Suppose you have email and
company_name columns. You
want to flag rows where the email domain doesn't match the company name. In the expression
field, click the AI button and type:
"flag true/false if the domain part of email matches the company_name (case insensitive)"
The AI generates:
LOWER(SPLIT_PART(SPLIT_PART(email, '@', 2), '.', 1)) = LOWER(REPLACE(REPLACE(LOWER(company_name), ' inc', ''), ' ', ''))
That expression extracts the domain from the email, strips common suffixes from the company name, and compares them case-insensitively. Constructing that by hand is tedious and error-prone. Describing it in English takes ten seconds.
AI generation - Add Column expression
Prompt:
"flag true/false if the domain part of email matches the company_name (case insensitive)"
Generated expression:
LOWER(SPLIT_PART(SPLIT_PART(email, '@', 2), '.', 1)) = LOWER(REPLACE(REPLACE(LOWER(company_name), ' inc', ''), ' ', ''))
New column preview (first 3 rows):
| company_name | domain_match | |
|---|---|---|
| alice@acme.com | Acme Inc | true |
| bob@gmail.com | Globex Corp | false |
| carol@globex.com | Globex Corp | true |
Example 3: Generate a full SQL query
The SQL operation gives you a free-form editor for writing SELECT queries. When you need something complex that the visual operations can't express, this is where you go.
For example: "Show the top 5 regions by total revenue, with a column for their percentage of overall revenue"
The AI returns a full query:
SELECT
region,
SUM(amount) AS total_revenue,
ROUND(SUM(amount) * 100.0 / (SELECT SUM(amount) FROM data), 2) AS pct_of_total
FROM data
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 5
The live preview shows the first 5 rows immediately so you can verify it's correct before applying it as a pipeline step.
What context does the AI see?
When you trigger AI generation, the app sends the following to the API:
- Your table schema: column names and their data types
- The type of SQL expected (condition, expression, or full query)
- Previous pipeline steps, so the AI knows what transformations have already been applied
- Your natural language description
It does not send your actual data rows. The AI only sees the structure of your data, not the contents. Your data stays in the browser.
When to use it (and when not to)
The AI generation is most useful for:
- Complex expressions involving multiple functions (string manipulation, date math, window functions)
- DuckDB-specific syntax you don't remember (PERCENTILE_CONT, STRPTIME, REGEXP_EXTRACT)
- Multi-step logic that's easier to describe than to write (nested CASE statements, correlated subqueries)
For simple things like status = 'active'
or amount > 100, just type
the SQL directly. It's faster.
Always check the generated SQL before applying it. The preview is there for a reason. The AI is good, but it's not infallible, and a quick glance at the output catches mistakes before they propagate through your pipeline.