Preparing Data for Migration to a New System
You are migrating from System A to System B. You have an export from the old system -- a CSV or Excel file with thousands of rows. System B has an import template, and it expects columns in a specific order, with specific names, and specific types. Your export does not match.
You could do this in Excel with a lot of manual column dragging and formula writing. Or you could script it in Python. But if you want something faster and visual, you can build a transformation pipeline in ExploreMyData that reshapes the data to match the target schema. Here is the workflow.
A concrete example
Say you are migrating a customer database. The old system exports these columns:
cust_id, fname, lname, email_addr, phone, addr_line1, addr_line2, city, st, zip, created, status
The new system expects:
customer_id, full_name, email, phone_number, address, city, state, postal_code, created_date, is_active
The mismatches: different column names, different column order, fname and lname need to be merged into full_name, addr_line1 and addr_line2 need to be combined, status is a text field ("active"/"inactive") but the new system expects a boolean is_active, and created is a string that needs to be a proper DATE type.
Step 1: Select and reorder columns
Open the toolbar and select Select Columns from the Columns group. Pick the columns you need and arrange them in the order the target system expects. Drop any columns the new system does not need.
In our example, select: cust_id, fname, lname, email_addr, phone, addr_line1, addr_line2, city, st, zip, created, status. Drag them into the target order.
Source schema (old system export)
| cust_id | fname | lname | email_addr | st | zip | status |
|---|---|---|---|---|---|---|
| 10041 | Ana | Silva | a.silva@lumos.io | CA | 94103 | active |
| 10042 | Ben | Marsh | bmarsh@hartco.com | TX | 78201 | inactive |
Target schema (new system expects)
| customer_id | full_name | state | postal_code | is_active | |
|---|---|---|---|---|---|
| - waiting for transformation pipeline - | |||||
Six mismatches to resolve: column renames, type conversions, two fields to combine, and a boolean cast.
Step 2: Rename columns to match the target
Use Copy Column from the Columns group. For each column that needs renaming, copy it with the new name, then drop the old one in a later Select Columns step.
Map the names:
cust_id→customer_idemail_addr→emailphone→phone_numberst→statezip→postal_code
Each Copy Column step produces the SQL column AS new_name in the generated query.
Step 3: Convert types
Select Convert Type from the Transform group.
- Convert
createdfrom VARCHAR to DATE. ExploreMyData usesTRY_CAST(), so if any dates are malformed, they become NULL instead of crashing the whole import. - Convert
postal_codeto VARCHAR if it was detected as INTEGER (zip codes with leading zeros get truncated when stored as numbers).
For the status to is_active conversion, use Add Column with the expression status = 'active' to produce a boolean, then name it is_active.
| customer_id | full_name | state | postal_code | created_date | is_active | |
|---|---|---|---|---|---|---|
| 10041 | Ana Silva | a.silva@lumos.io | CA | 94103 | 2023-04-15 | true |
| 10042 | Ben Marsh | bmarsh@hartco.com | TX | 78201 | 2022-11-02 | false |
After Convert Type: created is now DATE, postal_code is VARCHAR (preserves leading zeros), and status has been cast to boolean is_active.
Step 4: Combine split fields
The old system stores first and last name separately. The new system wants a single full_name field. Select Combine Columns from the Columns group:
- Select
fnameandlname. - Set the separator to a space:
- Name the result
full_name.
Do the same for addr_line1 and addr_line2, combining them with a comma-space separator into address. NULL values in either field are treated as empty strings, so you will not get "123 Main St, null" in the output.
Step 5: Split combined fields (when needed)
Sometimes the migration goes the other direction. The source system has a combined location field ("Austin, TX") but the target wants separate city and state columns.
Use Split Column from the Columns group:
- Select the column to split.
- Set the delimiter (e.g.,
,for comma-space). - Set the number of parts to 2.
The result creates location_1 and location_2. You can then use Copy Column to rename them to city and state.
Final cleanup and export
Add one more Select Columns step at the end to pick only the final columns in the exact order the target system expects. This drops any intermediate columns (the old names, the original split fields) and gives you a clean output.
Export the result as CSV or Parquet. The file is now ready to import into System B without any further transformation.
The full pipeline
- Select Columns - pick and reorder to approximate target schema
- Copy Column - rename columns to target names (repeat for each)
- Convert Type - cast dates, fix zip codes, create booleans
- Combine Columns - merge fname + lname, addr_line1 + addr_line2
- Split Column - break apart any combined source fields
- Select Columns - final column selection in exact target order
The pipeline is reusable. If you get a second export from the old system next week, drop it in and the same pipeline transforms it to match.