← All posts
5 min read

Using AI to Generate SQL Queries for Your Data

You know exactly what you want. "Show me orders where the total is above the 90th percentile." "Flag rows where the email domain doesn't match the company name." "Calculate the rolling 7-day average of daily revenue." The logic is clear in your head. The DuckDB syntax to express it is not.

ExploreMyData has an AI generation feature that takes a plain English description and returns the SQL you need. It works inside condition inputs, expression fields, and the SQL operation. You describe what you want, and it writes the query.

How it works

Anywhere the app accepts a SQL condition or expression, there's an AI generation button. Click it, type what you want in natural language, and the app sends your description to the Claude API along with context about your data: the table schema, column names and types, and any previous pipeline steps you've already applied.

That context is what makes it work. The AI doesn't just generate generic SQL. It generates SQL that references your actual column names, respects your data types, and accounts for transformations you've already made. The result slots directly into the input field, ready to preview and apply.

You'll need to configure an API key in Settings. The app supports Claude from Anthropic. The provider and model are configurable.

AI generation - natural language to SQL

You type:

"rows where created_at is in the last 30 days"

AI generates:

created_at >= CURRENT_DATE - INTERVAL '30 days'
Context sent to AI: table schema (column names + types), pipeline steps already applied. Your data rows are never sent.

Example 1: Generate a WHERE condition

Say you have a sales dataset with columns order_date, amount, region, and status. You want to filter to orders above the 90th percentile of amount.

In the Filter operation's condition field, click the AI button and type:

"orders where amount is above the 90th percentile"

The AI returns something like:

amount > (SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount) FROM data)

Writing that subquery by hand means knowing that DuckDB uses PERCENTILE_CONT with the WITHIN GROUP syntax. The AI knows. You just describe the intent.

Another common use: "rows where created_at is in the last 30 days" generates created_at >= CURRENT_DATE - INTERVAL '30 days'. Much faster than looking up DuckDB's date arithmetic syntax.

Example 2: Generate an expression for Add Column

The Add Column operation lets you create a new column from a SQL expression. This is where the AI generation really shines, because expressions can get complicated.

Suppose you have email and company_name columns. You want to flag rows where the email domain doesn't match the company name. In the expression field, click the AI button and type:

"flag true/false if the domain part of email matches the company_name (case insensitive)"

The AI generates:

LOWER(SPLIT_PART(SPLIT_PART(email, '@', 2), '.', 1)) = LOWER(REPLACE(REPLACE(LOWER(company_name), ' inc', ''), ' ', ''))

That expression extracts the domain from the email, strips common suffixes from the company name, and compares them case-insensitively. Constructing that by hand is tedious and error-prone. Describing it in English takes ten seconds.

AI generation - Add Column expression

Prompt:

"flag true/false if the domain part of email matches the company_name (case insensitive)"

Generated expression:

LOWER(SPLIT_PART(SPLIT_PART(email, '@', 2), '.', 1)) = LOWER(REPLACE(REPLACE(LOWER(company_name), ' inc', ''), ' ', ''))

New column preview (first 3 rows):

emailcompany_namedomain_match
alice@acme.comAcme Inctrue
bob@gmail.comGlobex Corpfalse
carol@globex.comGlobex Corptrue

Example 3: Generate a full SQL query

The SQL operation gives you a free-form editor for writing SELECT queries. When you need something complex that the visual operations can't express, this is where you go.

For example: "Show the top 5 regions by total revenue, with a column for their percentage of overall revenue"

The AI returns a full query:

SELECT
  region,
  SUM(amount) AS total_revenue,
  ROUND(SUM(amount) * 100.0 / (SELECT SUM(amount) FROM data), 2) AS pct_of_total
FROM data
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 5

The live preview shows the first 5 rows immediately so you can verify it's correct before applying it as a pipeline step.

What context does the AI see?

When you trigger AI generation, the app sends the following to the API:

  • Your table schema: column names and their data types
  • The type of SQL expected (condition, expression, or full query)
  • Previous pipeline steps, so the AI knows what transformations have already been applied
  • Your natural language description

It does not send your actual data rows. The AI only sees the structure of your data, not the contents. Your data stays in the browser.

When to use it (and when not to)

The AI generation is most useful for:

  • Complex expressions involving multiple functions (string manipulation, date math, window functions)
  • DuckDB-specific syntax you don't remember (PERCENTILE_CONT, STRPTIME, REGEXP_EXTRACT)
  • Multi-step logic that's easier to describe than to write (nested CASE statements, correlated subqueries)

For simple things like status = 'active' or amount > 100, just type the SQL directly. It's faster.

Always check the generated SQL before applying it. The preview is there for a reason. The AI is good, but it's not infallible, and a quick glance at the output catches mistakes before they propagate through your pipeline.

Try AI SQL generation →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData