Extracting Fields from JSON Columns into Regular Columns
You exported your products table and there it is: a column called metadata
containing values like {"weight_g": 85, "color": "black", "warranty_months": 12}.
All the information you need is in there, but it's trapped inside a JSON string. You can't filter by color, sort by weight,
or build a chart of warranty periods. The data exists but it's not usable.
This happens constantly with data exported from APIs, e-commerce platforms, and any system that stores flexible attributes as JSON blobs. Here's how to break those fields out into proper, typed columns in ExploreMyData.
What JSON columns look like in practice
JSON columns show up in CSVs more often than you'd expect. Shopify product exports, Stripe payment
metadata, survey tools that store custom fields, CRM systems with flexible attributes. The JSON
is usually valid, but every row might have a different set of keys. One product has a
warranty_months field, another doesn't. One has nested objects, another is flat.
A typical table might look like this:
| product | price | metadata |
|---|---|---|
| Wireless Mouse | 29.99 | {"weight_g": 85, "color": "black", "warranty_months": 12} |
| USB-C Hub | 49.99 | {"weight_g": 120, "color": "silver", "ports": 7} |
| Webcam | 79.99 | {"weight_g": 162, "color": "black", "resolution": "1080p"} |
You want weight_g and
color as their own columns
so you can sort products by weight or filter by color.
Using JSON Extract in dictionary mode
Open the toolbar and select JSON Extract from the Transform group. The operation works in dictionary mode: you pick the JSON column, then specify which keys to extract.
- Select the
metadatacolumn as the source. - Add the keys you want to extract:
weight_g,color,warranty_months. - For each key, optionally set a type cast. Leave
coloras VARCHAR, but castweight_gandwarranty_monthsto INTEGER so you can sort and aggregate them as numbers. - Choose whether to keep or remove the original
metadatacolumn. - Click Apply.
JSON Extract - key configuration:
| Key path | Output column name | Cast to type |
|---|---|---|
| $.color | color | VARCHAR |
| $.weight_g | weight_g | INTEGER |
| $.warranty_months | warranty_months | INTEGER |
Each key generates a TRY_CAST(json_extract_string(metadata, '$.key') AS TYPE) expression. Missing keys return NULL rather than an error.
The SQL under the hood
Each key extraction uses DuckDB's json_extract_string()
function, which pulls a value from a JSON object by key path and returns it as a string.
For numeric fields, the result is wrapped in TRY_CAST()
to convert it to the target type safely.
The generated SQL looks like this:
json_extract_string(metadata, '$.color') AS color
TRY_CAST(json_extract_string(metadata, '$.weight_g') AS INTEGER) AS weight_g
TRY_CAST(json_extract_string(metadata, '$.warranty_months') AS INTEGER) AS warranty_months
TRY_CAST is important here. If a row's JSON doesn't contain
the key you're extracting, or the value can't be converted to the target type, you get NULL instead of
an error. The USB-C Hub row has no warranty_months key,
so that cell becomes NULL rather than crashing your entire query.
The result
After extraction, your table looks like this:
| product | price | color | weight_g | warranty_months |
|---|---|---|---|---|
| Wireless Mouse | 29.99 | black | 85 | 12 |
| USB-C Hub | 49.99 | silver | 120 | NULL |
| Webcam | 79.99 | black | 162 | NULL |
Now you can sort by weight_g,
filter products where color = 'black',
or calculate the average warranty period. These are regular columns that work with every other operation in the tool.
| product | price | color | weight_g | warranty_months |
|---|---|---|---|---|
| Wireless Mouse | 29.99 | black | 85 | 12 |
| USB-C Hub | 49.99 | silver | 120 | NULL |
| Webcam | 79.99 | black | 162 | NULL |
Rows missing a key (e.g., no warranty_months) produce NULL in that cell. You can now sort by weight_g or filter by color = 'black'.
Dealing with inconsistent keys across rows
Real JSON columns rarely have the same keys in every row. That's fine. When you extract a key that doesn't exist in a particular row's JSON, the result is NULL for that cell. You don't need every row to have every key.
If you're not sure which keys exist across your data, scroll through a few rows of the JSON column first or use the column summary to see unique values. The JSON Extract dialog also scans sample rows to suggest available keys.
Keep or remove the source column
The JSON Extract operation gives you the option to keep the original JSON column alongside the new extracted columns, or to remove it. If you're extracting all the fields you need, removing the source column keeps your table clean. If you might need to extract more fields later, keep it. You can always add a Drop Columns step later.
Nested JSON
If your JSON has nested objects like {"dimensions": {"width": 10, "height": 5}},
you can reach into them using dot-path notation in the key field:
dimensions.width. The extraction uses
json_extract_string(metadata, '$.dimensions.width')
under the hood. This works for any depth of nesting.