← All posts
4 min read

Splitting Multi-Value Cells into Separate Rows

Your product data has a "tags" column. Each cell contains something like "electronics, sale, featured". Three values crammed into one cell, separated by commas.

This is fine for display. It's terrible for analysis. You can't count how many products are tagged "sale" because "sale" isn't a distinct value - it's buried inside a string. You can't filter by a single tag, group by tag, or join on tag. The data is locked up.

What you need is one row per tag, where the product appears three times (once for each tag) with all its other columns duplicated. This operation is called unnesting, and it's one of the most useful transformations for anyone working with real-world data.

What multi-value cells look like

Here's the starting point:

product_id name tags
101Wireless Mouseelectronics, sale, featured
102Running Shoessports, new-arrival
103Coffee Makerkitchen, sale, bestseller, featured

Three rows, but the tags column is doing too much. You can't answer "how many products are on sale?" without parsing strings.

Unnest: one row per value

In ExploreMyData, open the toolbar and select Unnest from the Transform group. Choose the column with delimited values (e.g., "tags") and set the delimiter - in this case, a comma (,).

Click Apply. The three rows become nine:

product_id name tags
101Wireless Mouseelectronics
101Wireless Mouse sale
101Wireless Mouse featured
102Running Shoessports
102Running Shoes new-arrival
103Coffee Makerkitchen
103Coffee Maker sale
103Coffee Maker bestseller
103Coffee Maker featured

Each tag is now its own row. The product_id and name columns are duplicated for each tag - which is exactly what you want.

Before Unnest (3 rows)

product_id tags
101electronics, sale, featured
102sports, new-arrival
103kitchen, sale, bestseller, featured

After Unnest (9 rows)

product_id tags
101electronics
101sale
101featured
102sports
102new-arrival
103kitchen
103sale
103bestseller
103featured

The SQL behind it

Open the pipeline to see what's happening:

SELECT product_id, name,
       UNNEST(STRING_SPLIT(tags, ',')) AS tags
FROM data

STRING_SPLIT(tags, ',') turns "electronics, sale, featured" into a list: ['electronics', ' sale', ' featured']. UNNEST() takes that list and expands it into separate rows. DuckDB handles the duplication of the other columns automatically.

Trim the whitespace

Look at the result table again. The split values have leading spaces: " sale" instead of "sale". That's because the original data was "electronics, sale, featured" with spaces after the commas. The split kept the spaces.

Fix this by adding a Text Transform step right after the unnest. Select the tags column and choose "trim". This strips leading and trailing whitespace from every value.

Pipeline steps to split and clean tags:

  1. 1Unnest on tags - delimiter: comma (,). Expands 3 rows into 9.
  2. 2Text Transform on tags - trim. Removes leading spaces left by ", " separators.

After trim: "electronics", "sale", "featured" - clean values ready for group-by or filter.

Now your tags are clean: "electronics", "sale", "featured" - no stray spaces.

What you can do after unnesting

With tags in separate rows, analysis becomes straightforward:

Count tag frequency: Group by tags, count rows. You'll see that "sale" appears in 2 products, "featured" in 2, and everything else in 1.

Filter by tag: Add a filter where tags = 'sale' to see only products on sale.

Pivot by tag: Create a cross-tabulation of product categories by tag to see which tags appear in which categories.

None of this was possible when the tags were stuck together in one cell.

Other delimiters

Commas are the most common delimiter, but the same technique works for any separator:

  • Semicolons: "skill1; skill2; skill3" - set delimiter to ;
  • Pipes: "red|blue|green" - set delimiter to |
  • Slashes: "US/CA/MX" - set delimiter to /
  • Spaces: "tag1 tag2 tag3" - set delimiter to a space

Understanding the row count increase

Unnesting increases your row count. If you started with 1,000 products and each has an average of 3 tags, you'll end up with roughly 3,000 rows. This is expected and correct - you're not duplicating data, you're normalizing it.

Keep the original row count in mind when doing aggregations after unnesting. If you sum revenue after unnesting tags, a product with 3 tags will contribute its revenue 3 times. Group by product_id first if you need accurate totals, or do your revenue calculations before the unnest step.

When not to unnest

Unnest is the right tool when you need to analyze or filter by individual values within a multi-value cell. It's overkill if you just need to check whether a cell contains a specific value - for that, a filter with CONTAINS(tags, 'sale') is simpler and doesn't change your row count.

Use unnest when you need to count, group, or join on individual values. Use string matching when you just need to filter.

Try unnesting your data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData