January 6, 2026 by Arif Aslam 4 min read

How to Remove Test Data and Junk Rows from a CSV

Someone hands you an export from the production database. You open it, check the row count: 48,000 rows. Great. Then you start scrolling and see this:

Order ID: TEST-001, customer: Test User
Email: test@test.com, revenue: $0.00
Order ID: DEMO-2024-05, customer: Demo Account
Email: admin@example.com, revenue: $99,999.00

Test orders. Demo accounts. QA entries. Placeholder transactions with $0 revenue. Internal accounts with inflated dollar amounts. They're all mixed in with real customer data, and they're wrecking your numbers.

Your average order value is wrong because of the $0 test transactions. Your revenue total is inflated by the $99,999 demo entries. Your customer count includes fake accounts. You need to get rid of all of it before you can trust any calculation.

Here's how to clean it up in ExploreMyData.

Identify the patterns

Test data usually follows predictable patterns. Before you start filtering, spend a minute figuring out what yours looks like. Sort by the order ID column - test IDs often cluster together alphabetically because they start with "TEST" or "DEMO". Sort by email to spot the @test.com and @example.com addresses. Sort by revenue to find the $0.00 entries at the bottom.

In most exports, test data falls into a few categories:

Prefixed IDs: order IDs starting with "TEST", "DEMO", "QA", or "SANDBOX"
Fake emails: addresses containing "test.com", "example.com", or "@yourcompany.com"
Zero-value transactions: $0.00 revenue, 0 quantity
Placeholder names: "Test User", "Jane Doe", "Foo Bar"
Internal accounts: employee email domains, specific account IDs

Orders export sorted by order_id - test and demo rows cluster near the top alphabetically:

order_id	customer_name	email	revenue
DEMO-2024-05	Demo Account	demo@example.com	$99,999.00
ORD-10042	Sarah Chen	sarah@techwave.io	$249.00
ORD-10043	Marcus Webb	mwebb@acme.com	$89.00
TEST-001	Test User	test@test.com	$0.00
TEST-002	Test User	test@test.com	$0.00
ORD-10044	Priya Sharma	priya@dataflow.com	$399.00

Italicized rows are test/demo entries to be removed. They're easy to spot once sorted.

Build your filter with the Condition Builder

Open Filter from the Filter & Sort group. This opens the Condition Builder, where you can combine multiple conditions with AND/OR logic.

For a typical cleanup, you want to keep rows that pass all of these conditions. Set the logic to AND and add:

order_id does NOT start with TEST
order_id does NOT start with DEMO
email does NOT contain test.com
email does NOT contain example.com
revenue is not equal to 0

Click Apply. The generated SQL is a single WHERE clause:

WHERE NOT (order_id LIKE 'TEST%') AND NOT (order_id LIKE 'DEMO%') AND NOT (email LIKE '%test.com%') AND NOT (email LIKE '%example.com%') AND revenue != 0

One pipeline step. All five conditions evaluated together. The row count drops from 48,000 to 45,200 - meaning about 6% of your "data" was junk.

Five filter conditions combined with AND logic to exclude junk rows:

#	Column	Condition	Value
1	order_id	does not start with	TEST
2	order_id	does not start with	DEMO
3	email	does not contain	test.com
4	email	does not contain	example.com
5	revenue	is not equal to	0

All five conditions must be true for a row to be kept (AND logic). Row count: 48,000 → 45,200.

Clean up duplicates while you're at it

Test data has a way of creating duplicates too. A QA engineer runs the same test scenario five times, generating five rows with slightly different timestamps but otherwise identical data. Even after filtering out the obvious test rows, you might have duplicates in the real data.

After your filter step, add Remove Duplicates from the Filter & Sort group. Choose the columns that define a unique record - maybe order_id alone, or a combination of email + order_date + product if order IDs aren't reliable.

This generates SELECT DISTINCT ON (order_id) * and keeps the first occurrence of each.

Using OR logic for complex cases

Sometimes the patterns overlap in ways that need OR logic. Say you want to remove rows where the email contains "test.com" OR the customer name is exactly "Test User" OR the order ID starts with "QA". Any one of those conditions is enough to mark a row as junk.

In the Condition Builder, you can nest groups. Create an OR group with the three conditions above, then wrap the whole thing in a NOT. The result:

WHERE NOT (email LIKE '%test.com%' OR customer_name = 'Test User' OR order_id LIKE 'QA%')

This removes any row that matches at least one of the test patterns.

Check your work

After filtering, do a quick sanity check. Sort by revenue descending - are there still suspiciously large values that look like test data? Sort by email - any remaining @yourcompany.com addresses that should be excluded? Check the row count: does the number make sense for the time period?

The pipeline makes this easy to iterate on. If you spot more junk, add another filter condition. If you removed too much, click into the filter step and adjust the conditions. Every change rebuilds the results instantly.

Pipeline steps and their effect on row count:

Step	Operation	Rows remaining	Rows removed
-	Original data	48,000	-
1	Filter (5 conditions, AND)	45,200	2,800
2	Remove Duplicates (on order_id)	44,890	310

About 6.5% of the original file was junk or duplicate data.

Make it a habit

If you get regular exports from the same system, the test data patterns will be the same every time. The order of operations is always: filter out junk first, then analyze. Fifteen minutes of cleaning saves hours of second-guessing numbers that look off because a demo account with $99,999 in revenue is hiding in the data.

Start cleaning your data →

How to Remove Test Data and Junk Rows from a CSV

Identify the patterns

Build your filter with the Condition Builder

Clean up duplicates while you're at it

Using OR logic for complex cases

Check your work

Make it a habit

Try it yourself

Related tools

More from the blog

How to Standardize Inconsistent Company Names in a CSV

Fixing Dates When Your CSV Has Three Different Formats

How to Analyze a CSV File Without Uploading It Anywhere