February 15, 2026 by Arif Aslam 5 min read

Analyzing Data Across Multiple Files

You have three files. orders.csv has order IDs, customer IDs, product names, and amounts. customers.csv has customer IDs, names, and tier levels. products.csv has product names, categories, and margins.

The question you need to answer: "What's the total revenue by customer tier for each product category?" No single file has all the columns. You need to combine them.

In a traditional workflow, you'd load everything into a database or write a Python script. In ExploreMyData, you load all three files as tabs, join them together, and build your analysis on the combined result. Everything stays in the browser.

Step 1: Load all your files

Open ExploreMyData and load your first file. Then load the second and third. Each file gets its own tab at the top of the workspace. You can switch between tabs to preview each file independently.

With all three files loaded, you can see:

orders - 10,000 rows: order_id, customer_id, product_name, amount, order_date
customers - 500 rows: customer_id, name, email, tier, signup_date
products - 50 rows: product_name, category, margin_pct, supplier

Tab	File	Rows	Columns
1	orders.csv	10,000	order_id, customer_id, product_name, amount, order_date
2	customers.csv	500	customer_id, name, email, tier, signup_date
3	products.csv	50	product_name, category, margin_pct, supplier

All three files are available as separate tabs. Each can be previewed and cleaned independently before joining.

Step 2: Join orders with customers

On the orders tab, open the toolbar and select Join. You'll see a dialog that lets you pick the other table and the join columns.

Select customers as the table to join with.
Set the join column: customer_id from orders matches customer_id from customers.
Choose the join type. For this analysis, LEFT JOIN makes sense - keep all orders, even if a customer record is missing.
Click Apply.

The result is a new view with all the order columns plus the customer columns (name, email, tier, signup_date). The pipeline now shows the join as a step. Under the hood, it's running:

SELECT *
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id

Join configuration

Join with: customers
Join type: LEFT JOIN
Left key (orders): customer_id
Right key (customers): customer_id

Generated SQL: SELECT * FROM orders LEFT JOIN customers ON orders.customer_id = customers.customer_id

Result: 10,000 rows with all order columns plus name, email, tier, signup_date from customers.

Step 3: Join the result with products

Now that orders and customers are combined, you can add the product data. Add another Join step:

Select products as the table to join with.
Set the join column: product_name from the current view matches product_name from products.
LEFT JOIN again.
Apply.

The pipeline now has two join steps. The view contains columns from all three files: order details, customer tier, and product category. You're ready to analyze.

Step 4: Build the analysis

With the combined data, answering "total revenue by customer tier and product category" is straightforward. Add a Pivot operation:

Group by: tier and category
Aggregate: SUM(amount) as total_revenue
Apply, then add a Sort step to order by total_revenue descending.

The result is a clean table showing exactly what was asked. The pipeline has four steps: join customers, join products, aggregate, sort. Each step is visible, editable, and deletable.

tier	category	total_revenue
Enterprise	Electronics	98,400
Enterprise	Software	76,200
Growth	Electronics	54,100
Growth	Software	38,900
Starter	Electronics	21,600
Starter	Software	14,300

Total revenue by customer tier and product category, combining data from all three files. Sorted by total_revenue descending.

How joins work in the pipeline

A join is just another pipeline step. It combines two tables into one view. Every operation after the join works on the combined data. This means you can:

Filter the joined data (e.g., only "premium" tier customers)
Add calculated columns using fields from both tables (e.g., amount * margin_pct)
Aggregate across dimensions that came from different files
Chain multiple joins to bring in three, four, or more files

If you delete a join step, subsequent steps that reference columns from the joined table will break. The pipeline will show an error, and you can fix it by re-adding the join or adjusting the downstream steps.

Multi-file tabs vs. multi-view tabs

There's an important distinction. Each loaded file gets its own tab. You can apply independent pipelines to each file. When you join, the result appears in the current tab's pipeline.

This means you can clean each file independently before joining. For example:

On the customers tab: trim whitespace from names, fill missing tier values with "standard".
On the products tab: lowercase all category names for consistency.
On the orders tab: filter out cancelled orders, then join with the cleaned customers and products.

The join references the other file's current state, including any pipeline steps applied to it. Clean first, then join.

Tips for multi-file analysis

Check join keys first. Before joining, switch to each tab and look at the join column. Are the values consistent? Does one file use "US" and the other "United States"? Clean that before joining.
Watch for duplicate columns. If both files have a column called "name", the join will create two columns. Use Select Columns to drop the one you don't need.
Start with the largest table. Join the smaller lookup tables into the main fact table, not the other way around. It's more intuitive and easier to verify.
Use LEFT JOIN by default. It preserves all rows from your main table even if there's no match in the lookup table. INNER JOIN drops unmatched rows, which can silently lose data.

Load your files and start joining →

Analyzing Data Across Multiple Files

Step 1: Load all your files

Step 2: Join orders with customers

Step 3: Join the result with products

Step 4: Build the analysis

How joins work in the pipeline

Multi-file tabs vs. multi-view tabs

Tips for multi-file analysis

Try it yourself

Related tools

More from the blog

How to Join Multiple CSV Files in Your Browser

Compare Two Files Side by Side

E-commerce Cohort Analysis Workflow