← All posts
5 min read

Preparing Data for Migration to a New System

You are migrating from System A to System B. You have an export from the old system -- a CSV or Excel file with thousands of rows. System B has an import template, and it expects columns in a specific order, with specific names, and specific types. Your export does not match.

You could do this in Excel with a lot of manual column dragging and formula writing. Or you could script it in Python. But if you want something faster and visual, you can build a transformation pipeline in ExploreMyData that reshapes the data to match the target schema. Here is the workflow.

A concrete example

Say you are migrating a customer database. The old system exports these columns:

cust_id, fname, lname, email_addr, phone, addr_line1, addr_line2, city, st, zip, created, status

The new system expects:

customer_id, full_name, email, phone_number, address, city, state, postal_code, created_date, is_active

The mismatches: different column names, different column order, fname and lname need to be merged into full_name, addr_line1 and addr_line2 need to be combined, status is a text field ("active"/"inactive") but the new system expects a boolean is_active, and created is a string that needs to be a proper DATE type.

Step 1: Select and reorder columns

Open the toolbar and select Select Columns from the Columns group. Pick the columns you need and arrange them in the order the target system expects. Drop any columns the new system does not need.

In our example, select: cust_id, fname, lname, email_addr, phone, addr_line1, addr_line2, city, st, zip, created, status. Drag them into the target order.

Source schema (old system export)

cust_idfnamelnameemail_addrstzipstatus
10041AnaSilvaa.silva@lumos.ioCA94103active
10042BenMarshbmarsh@hartco.comTX78201inactive

Target schema (new system expects)

customer_idfull_nameemailstatepostal_codeis_active
- waiting for transformation pipeline -

Six mismatches to resolve: column renames, type conversions, two fields to combine, and a boolean cast.

Step 2: Rename columns to match the target

Use Copy Column from the Columns group. For each column that needs renaming, copy it with the new name, then drop the old one in a later Select Columns step.

Map the names:

  • cust_idcustomer_id
  • email_addremail
  • phonephone_number
  • ststate
  • zippostal_code

Each Copy Column step produces the SQL column AS new_name in the generated query.

Step 3: Convert types

Select Convert Type from the Transform group.

  1. Convert created from VARCHAR to DATE. ExploreMyData uses TRY_CAST(), so if any dates are malformed, they become NULL instead of crashing the whole import.
  2. Convert postal_code to VARCHAR if it was detected as INTEGER (zip codes with leading zeros get truncated when stored as numbers).

For the status to is_active conversion, use Add Column with the expression status = 'active' to produce a boolean, then name it is_active.

customer_idfull_nameemailstatepostal_codecreated_dateis_active
10041Ana Silvaa.silva@lumos.ioCA941032023-04-15true
10042Ben Marshbmarsh@hartco.comTX782012022-11-02false

After Convert Type: created is now DATE, postal_code is VARCHAR (preserves leading zeros), and status has been cast to boolean is_active.

Step 4: Combine split fields

The old system stores first and last name separately. The new system wants a single full_name field. Select Combine Columns from the Columns group:

  1. Select fname and lname.
  2. Set the separator to a space:
  3. Name the result full_name.

Do the same for addr_line1 and addr_line2, combining them with a comma-space separator into address. NULL values in either field are treated as empty strings, so you will not get "123 Main St, null" in the output.

Step 5: Split combined fields (when needed)

Sometimes the migration goes the other direction. The source system has a combined location field ("Austin, TX") but the target wants separate city and state columns.

Use Split Column from the Columns group:

  1. Select the column to split.
  2. Set the delimiter (e.g., , for comma-space).
  3. Set the number of parts to 2.

The result creates location_1 and location_2. You can then use Copy Column to rename them to city and state.

Final cleanup and export

Add one more Select Columns step at the end to pick only the final columns in the exact order the target system expects. This drops any intermediate columns (the old names, the original split fields) and gives you a clean output.

Export the result as CSV or Parquet. The file is now ready to import into System B without any further transformation.

The full pipeline

  1. Select Columns - pick and reorder to approximate target schema
  2. Copy Column - rename columns to target names (repeat for each)
  3. Convert Type - cast dates, fix zip codes, create booleans
  4. Combine Columns - merge fname + lname, addr_line1 + addr_line2
  5. Split Column - break apart any combined source fields
  6. Select Columns - final column selection in exact target order

The pipeline is reusable. If you get a second export from the old system next week, drop it in and the same pipeline transforms it to match.

Start preparing your migration →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData