← All posts
5 min read

Extracting Fields from JSON Columns into Regular Columns

You exported your products table and there it is: a column called metadata containing values like {"weight_g": 85, "color": "black", "warranty_months": 12}. All the information you need is in there, but it's trapped inside a JSON string. You can't filter by color, sort by weight, or build a chart of warranty periods. The data exists but it's not usable.

This happens constantly with data exported from APIs, e-commerce platforms, and any system that stores flexible attributes as JSON blobs. Here's how to break those fields out into proper, typed columns in ExploreMyData.

What JSON columns look like in practice

JSON columns show up in CSVs more often than you'd expect. Shopify product exports, Stripe payment metadata, survey tools that store custom fields, CRM systems with flexible attributes. The JSON is usually valid, but every row might have a different set of keys. One product has a warranty_months field, another doesn't. One has nested objects, another is flat.

A typical table might look like this:

product price metadata
Wireless Mouse 29.99 {"weight_g": 85, "color": "black", "warranty_months": 12}
USB-C Hub 49.99 {"weight_g": 120, "color": "silver", "ports": 7}
Webcam 79.99 {"weight_g": 162, "color": "black", "resolution": "1080p"}

You want weight_g and color as their own columns so you can sort products by weight or filter by color.

Using JSON Extract in dictionary mode

Open the toolbar and select JSON Extract from the Transform group. The operation works in dictionary mode: you pick the JSON column, then specify which keys to extract.

  1. Select the metadata column as the source.
  2. Add the keys you want to extract: weight_g, color, warranty_months.
  3. For each key, optionally set a type cast. Leave color as VARCHAR, but cast weight_g and warranty_months to INTEGER so you can sort and aggregate them as numbers.
  4. Choose whether to keep or remove the original metadata column.
  5. Click Apply.

JSON Extract - key configuration:

Key path Output column name Cast to type
$.colorcolorVARCHAR
$.weight_gweight_gINTEGER
$.warranty_monthswarranty_monthsINTEGER

Each key generates a TRY_CAST(json_extract_string(metadata, '$.key') AS TYPE) expression. Missing keys return NULL rather than an error.

The SQL under the hood

Each key extraction uses DuckDB's json_extract_string() function, which pulls a value from a JSON object by key path and returns it as a string. For numeric fields, the result is wrapped in TRY_CAST() to convert it to the target type safely.

The generated SQL looks like this:

json_extract_string(metadata, '$.color') AS color

TRY_CAST(json_extract_string(metadata, '$.weight_g') AS INTEGER) AS weight_g

TRY_CAST(json_extract_string(metadata, '$.warranty_months') AS INTEGER) AS warranty_months

TRY_CAST is important here. If a row's JSON doesn't contain the key you're extracting, or the value can't be converted to the target type, you get NULL instead of an error. The USB-C Hub row has no warranty_months key, so that cell becomes NULL rather than crashing your entire query.

The result

After extraction, your table looks like this:

product price color weight_g warranty_months
Wireless Mouse 29.99 black 85 12
USB-C Hub 49.99 silver 120 NULL
Webcam 79.99 black 162 NULL

Now you can sort by weight_g, filter products where color = 'black', or calculate the average warranty period. These are regular columns that work with every other operation in the tool.

product price color weight_g warranty_months
Wireless Mouse29.99black8512
USB-C Hub49.99silver120NULL
Webcam79.99black162NULL

Rows missing a key (e.g., no warranty_months) produce NULL in that cell. You can now sort by weight_g or filter by color = 'black'.

Dealing with inconsistent keys across rows

Real JSON columns rarely have the same keys in every row. That's fine. When you extract a key that doesn't exist in a particular row's JSON, the result is NULL for that cell. You don't need every row to have every key.

If you're not sure which keys exist across your data, scroll through a few rows of the JSON column first or use the column summary to see unique values. The JSON Extract dialog also scans sample rows to suggest available keys.

Keep or remove the source column

The JSON Extract operation gives you the option to keep the original JSON column alongside the new extracted columns, or to remove it. If you're extracting all the fields you need, removing the source column keeps your table clean. If you might need to extract more fields later, keep it. You can always add a Drop Columns step later.

Nested JSON

If your JSON has nested objects like {"dimensions": {"width": 10, "height": 5}}, you can reach into them using dot-path notation in the key field: dimensions.width. The extraction uses json_extract_string(metadata, '$.dimensions.width') under the hood. This works for any depth of nesting.

Extract JSON fields in ExploreMyData →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData