Opening and Exploring JSON Files as Tables
You just downloaded an API response. Maybe it's a list of users from Stripe, events from Mixpanel, or records from your company's internal API. The file is JSON - an array of objects, some with nested fields, some with arrays inside arrays. You open it in a text editor and see 4,000 lines of curly braces.
What you actually want is a table. Rows and columns. Something you can scan visually, sort, filter, and make sense of without parsing nested structures in your head. That's what this guide covers.
Loading JSON and JSONL files
Open ExploreMyData and drag your JSON file onto the page (or click to browse). The tool handles two formats:
- JSON: A single array of objects, like
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}] - JSONL (JSON Lines): One JSON object per line, no wrapping array. Common in log files and streaming exports.
DuckDB reads both formats natively. Each top-level key in the objects becomes a column, and each object becomes a row. If your file has 500 objects with keys "id", "name", "email", and "address", you get a table with 500 rows and 4 columns.
| id | name | tags | address | |
|---|---|---|---|---|
| BIGINT VARCHAR VARCHAR VARCHAR[] STRUCT | ||||
| 1 | Alice Chen | alice@example.com | ["premium","enterprise"] | {"street":"123 Main","city":"Portland","state":"OR"} |
| 2 | Bob Lund | bob@example.com | ["starter"] | {"street":"40 Oak Ave","city":"Seattle","state":"WA"} |
| 3 | Priya Shah | priya@example.com | ["premium","early-adopter"] | {"street":"7 Pine St","city":"Austin","state":"TX"} |
JSON loaded as a table. Top-level keys become columns. The tags column is a VARCHAR[] (array) and address is a STRUCT - both need further processing.
The nesting problem
The table loads, but some columns look wrong. Instead of plain values, you see things like
{"street": "123 Main", "city": "Portland", "state": "OR"}
in the address column. That's because JSON objects can nest, and DuckDB preserves nested
structures as STRUCT or JSON types.
This is technically correct but not useful when you want to filter by city or sort by state. You need to flatten the nested fields into their own columns.
Flattening nested objects with JSON Extract
Use the JSON Extract operation in dictionary mode. Select the nested column (e.g., "address") and choose the keys you want to extract. Each key becomes a new column.
For example, extracting "street", "city", and "state" from the address column gives you three
new columns: address_street,
address_city, and
address_state.
After JSON Extract (dictionary mode) on the address column, extracting street, city, state:
| id | name | address_street | address_city | address_state |
|---|---|---|---|---|
| 1 | Alice Chen | 123 Main St | Portland | OR |
| 2 | Bob Lund | 40 Oak Ave | Seattle | WA |
| 3 | Priya Shah | 7 Pine St | Austin | TX |
The nested STRUCT is flattened into three plain columns. Now you can filter by city, group by state, or sort by street - just like any other column.
Under the hood, this generates SQL like
json_extract_string(address, '$.city')
for each key. You don't need to write this - just pick the keys from the list.
Expanding arrays
Some JSON fields contain arrays. A user might have a "tags" field like
["premium", "enterprise", "early-adopter"],
or an "orders" field that's a list of order objects.
Use JSON Extract in array mode for these columns. This unnests the array, creating one row per array element. A user with 3 tags becomes 3 rows (one per tag), with all the other columns duplicated.
This is the JSON equivalent of a SQL UNNEST().
It's the right approach when you need to analyze the individual elements - count how many users
have each tag, or look at order-level details.
Be careful with large arrays. If each row has 50 array elements and you have 1,000 rows, unnesting gives you 50,000 rows. That's usually fine for analysis, but it's worth knowing the row count will jump.
Exploring the structure with Column Explorer
After loading a JSON file, open the Column Explorer to understand what you're working with. It shows you:
- The data type of each column (VARCHAR, INTEGER, STRUCT, JSON, etc.)
- How many NULL values exist (missing keys in the source JSON)
- Value distributions for categorical columns
- Min/max for numeric columns
Columns with type STRUCT or JSON are the ones that need flattening. Columns with type VARCHAR[] or JSON[] are arrays that might need expanding. The Column Explorer helps you identify which columns need further processing before you can analyze them.
| Column | Type | Nulls | Action needed |
|---|---|---|---|
| id | BIGINT | 0 | None |
| name | VARCHAR | 0 | None |
| VARCHAR | 12 | None | |
| tags | VARCHAR[] | 8 | JSON Extract (array mode) |
| address | STRUCT | 3 | JSON Extract (dictionary mode) |
Column Explorer view. STRUCT and VARCHAR[] types flag the columns that need flattening or expansion before analysis.
Trimming to the columns you need
API responses tend to include everything. A Stripe customer object has dozens of fields, but you probably only care about five or six. After loading and flattening, use Select Columns to keep only what you need.
Pick the columns you want, in the order you want them. Everything else is dropped from the view. This makes the table manageable and focused on the data you actually need to analyze.
The pipeline preserves the full data - Select Columns just filters the output. If you realize later you need another field, delete the Select Columns step and the original columns reappear.
Common JSON exploration workflows
Here are a few patterns that come up regularly:
- API response audit: Load the JSON, check Column Explorer for structure, extract nested fields, Select Columns to focus on what matters. Good for understanding what an API actually returns vs. what the docs say.
- Log file analysis: Load a JSONL log file, filter by timestamp range or log level, group by error type. Each log entry becomes a row.
- Config comparison: Load two JSON config files as tables, join on the key name, compare values side by side.
- Nested data flattening: Extract all nested objects, drop the original nested columns, export as CSV. Useful when downstream tools can't handle JSON nesting.
JSON vs. CSV: when each makes sense
JSON is better at representing complex, hierarchical data. CSV is better for flat, tabular data. When you load JSON into ExploreMyData, you're essentially converting from one to the other - flattening a hierarchy into a table.
This works well when the data is mostly tabular with a few nested fields. It works less well when the structure is deeply nested or highly variable (different objects have completely different keys). For those cases, you might need to extract specific paths rather than trying to flatten everything.
The point is to get from "I have a JSON file" to "I can see and explore this data" as fast as possible. Load, flatten what needs flattening, trim to the columns you need, and start asking questions.