← All posts
5 min read

5 Ways to Clean Messy Data Without Writing Code

Real-world data is messy. Names are inconsistently capitalized, dates are in three different formats, columns have blank cells, and somehow there are 47 duplicate rows. Before you can analyze anything, you need to clean it up.

Here are five common data quality issues and how to fix them in ExploreMyData without writing a single line of code.

1. Remove duplicate rows

Duplicates are the most common data quality issue. They inflate counts, skew averages, and break joins.

How to fix:

  1. Open the toolbar and select Remove Duplicates from the Filter & Sort group.
  2. Choose which columns to check for duplicates (or leave empty to check all columns).
  3. Click Apply. Duplicate rows are removed, keeping the first occurrence.

The pipeline shows the step as SELECT DISTINCT ON (...) or SELECT DISTINCT * depending on your selection.

2. Fix inconsistent text

"New York", "new york", "NEW YORK", " New York ". these are all the same city but they'll be treated as four different values in any group-by or filter.

How to fix:

  1. Select Text Transform from the Transform group.
  2. Choose the column (e.g., "city").
  3. Select "trim" to remove leading/trailing whitespace.
  4. Apply, then apply again with "lowercase" (or "capitalize") to normalize case.

For more specific fixes, use Find & Replace. Select the column, enter the value to find (e.g., "NY") and the replacement (e.g., "New York"). This uses DuckDB's REPLACE() function.

3. Fill missing values

Blank cells (NULLs) can break calculations and create gaps in charts. ExploreMyData offers three fill strategies.

How to fix:

  1. Select Fill Missing from the Data group.
  2. Choose the column with missing values.
  3. Pick a method:
    • Literal: replace NULLs with a specific value (e.g., "Unknown" or "0")
    • Forward fill: use the previous row's value
    • Backward fill: use the next row's value

Forward and backward fill use DuckDB's window functions (LAG() and LEAD()) under the hood.

4. Convert column types

A "price" column imported as text can't be summed or averaged. A "date" column stored as a string won't sort chronologically.

How to fix:

  1. Select Convert Type from the Transform group.
  2. Choose the column.
  3. Select the target type: VARCHAR, INTEGER, BIGINT, DOUBLE, BOOLEAN, DATE, or TIMESTAMP.

ExploreMyData uses TRY_CAST(), which returns NULL for values that can't be converted instead of throwing an error. This is safer than a hard CAST.

5. Split or combine columns

A "full_name" column that should be "first_name" and "last_name". Or "city" and "state" columns that should be a single "location".

To split:

  1. Select Split Column from the Columns group.
  2. Choose the column and the delimiter (e.g., " " for space, "," for comma).
  3. Set the number of parts. The result creates new columns named column_1, column_2, etc.

To combine:

  1. Select Combine Columns from the Columns group.
  2. Pick two or more columns and set a separator (e.g., ", ").
  3. Give the new column a name. NULL values in any column are treated as empty strings.

Everything is a pipeline step

Every cleaning operation becomes a step in your pipeline. You can see the exact SQL generated, delete steps you don't need, and the pipeline rebuilds automatically. This means your data cleaning is documented and reproducible.

Start cleaning your data →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData