Working with XML Data: Import, Explore, and Convert
XML remains a common export format across enterprise systems, SOAP APIs, RSS feeds, and government open data portals. If you have ever tried to open an XML file in a spreadsheet, you know the result is usually a mess of nested tags rather than usable rows and columns.
ExploreMyData can import XML files directly and flatten them into a table you can filter, transform, and export. The entire process runs in your browser. Nothing is uploaded to a server.
How the import works
When you load an XML file, the parser walks the document tree and converts it to a JSON intermediate representation. That JSON is then loaded into DuckDB as a table. Each repeating element becomes a row, and its child elements become columns. Attributes are preserved as separate columns with an @ prefix.
For example, an element like <item id="42"> produces a column called @id alongside any child element columns. Nested structures are flattened using dot notation: author.name, author.email.
Example: importing an RSS feed
Consider a standard RSS feed exported from a blog or news site. The XML contains a <channel> element with multiple <item> children:
<rss version="2.0">
<channel>
<title>Engineering Blog</title>
<item>
<title>Deploying to Production</title>
<link>https://example.com/deploy</link>
<pubDate>Mon, 01 Apr 2026 08:00:00 GMT</pubDate>
</item>
<item>
<title>Database Migration Patterns</title>
<link>https://example.com/migrations</link>
<pubDate>Fri, 28 Mar 2026 10:30:00 GMT</pubDate>
</item>
</channel>
</rss>
After import, the table looks like this:
| title | link | pubDate |
|---|---|---|
| Deploying to Production | https://example.com/deploy | Mon, 01 Apr 2026 08:00:00 GMT |
| Database Migration Patterns | https://example.com/migrations | Fri, 28 Mar 2026 10:30:00 GMT |
Each <item> becomes a row. The child elements become columns. From here you can sort by date, filter by title keyword, or convert the pubDate string to a proper date type for time-based analysis.
Common XML sources
The same import process works for several common XML formats:
- RSS and Atom feeds - blog posts, podcast episodes, news articles. Each entry becomes a row with title, link, date, and description columns.
- SOAP API responses - many legacy enterprise APIs return XML. Save the response body as a file, then import it to inspect the payload as a table.
- CRM and ERP exports - Salesforce, SAP, and similar systems often provide XML export options. These tend to be deeply nested, but the flattening logic handles multiple levels.
- Government open data - regulatory filings, public records, and statistical datasets are frequently published as XML. The U.S. SEC EDGAR system and EU open data portals are typical examples.
After import: transform and export
Once your XML data is loaded as a table, you have access to the full pipeline. Common next steps:
- Rename columns - XML element names are often verbose. Rename
publicationDatetodatefor clarity. - Convert types - dates arrive as strings. Cast them to a date type so you can sort and filter by date range.
- Filter rows - remove entries that don't match your criteria. Keep only items from the last 30 days, or only records with a specific status.
- Aggregate - count entries by category, sum values by group, or compute averages. The grouping and aggregation guide covers this in detail.
When you are done, export the result in the format your downstream system expects:
- XML to CSV for spreadsheet or database import
- XML to Excel for sharing with colleagues who use Excel
- XML to JSON for modern APIs and web applications