← All posts
4 min read

Converting Yes/No/True/False/1/0 to a Consistent Boolean

You're looking at an is_active column. It should be simple - either the account is active or it isn't. But the values tell a different story: "Yes", "Y", "TRUE", "true", "1", "yes", "True". And that's just the truthy side. The false values are equally chaotic: "No", "N", "FALSE", "0", "false", "no".

This happens when data comes from multiple sources, or when different people entered it manually, or when someone exported from a system that uses "Y/N" and another that uses "true/false". The result: you can't filter for active accounts reliably. A filter for "TRUE" misses the rows that say "Yes". A filter for "1" misses the rows that say "true".

Here's how to collapse all of that into a clean boolean column in ExploreMyData.

What the data looks like

Here's a sample from a user accounts export:

user_id name is_active email_verified
1001Alice OkonkwoYesTRUE
1002Ben TorresY1
1003Clara Johanssontruefalse
1004David KimNoFALSE
1005Elena Petrov10
1006Farhan AliFALSETrue
1007Grace Nakamura0yes

Two columns, both supposed to be boolean, both a mess. The is_active column alone has seven distinct values that represent just two states.

Step 1: Map all values with Bulk Replace

The fastest approach is Bulk Replace. This operation lets you define groups of values and what each group should become. Open the toolbar, select Bulk Replace from the Transform group, and choose the is_active column.

Define two groups:

  • Group 1: "Yes", "Y", "TRUE", "true", "True", "1", "yes" → replace with true
  • Group 2: "No", "N", "FALSE", "false", "False", "0", "no" → replace with false

The SQL that ExploreMyData generates is a CASE WHEN with every variation mapped:

CASE WHEN is_active IN ('Yes', 'Y', 'TRUE', 'true', 'True', '1', 'yes') THEN 'true' WHEN is_active IN ('No', 'N', 'FALSE', 'false', 'False', '0', 'no') THEN 'false' ELSE is_active END

That ELSE is_active END at the end is a safety net. If there's a value you didn't account for, it stays as-is rather than silently becoming NULL. You'll spot it when you check the results.

Bulk Replace groups - all truthy variations map to "true", all falsy to "false":

Source values (is_active) Target
Yes, Y, TRUE, true, True, 1, yestrue
No, N, FALSE, false, False, 0, nofalse

After applying, all rows show either "true" or "false" as a string. Convert Type (BOOLEAN) in the next step makes them real booleans.

Result after Bulk Replace:

user_id is_active (before) is_active (after)
1001Yestrue
1002Ytrue
1003truetrue
1004Nofalse
10051true
1006FALSEfalse
10070false

Step 2: Convert to actual BOOLEAN type

After the Bulk Replace, every row in is_active says either "true" or "false". But it's still a text column. The values are strings, not booleans.

Select Convert Type from the Transform group. Choose the is_active column and set the target type to BOOLEAN.

The SQL:

TRY_CAST(is_active AS BOOLEAN)

TRY_CAST is important here. If there's a stray value that didn't get caught by the Bulk Replace - say someone typed "active" instead of "true" - TRY_CAST returns NULL for that row instead of throwing an error. A regular CAST would fail the entire operation.

Convert Type step - column type changes from VARCHAR to BOOLEAN:

Column Type before Type after SQL
is_activeVARCHARBOOLEANTRY_CAST(is_active AS BOOLEAN)
email_verifiedVARCHARBOOLEANTRY_CAST(email_verified AS BOOLEAN)

TRY_CAST returns NULL for any value that can't convert rather than crashing the query.

Now it works like a boolean

With a real BOOLEAN column, everything just works:

  • Filtering - filter where is_active = true catches every active user, not just the ones who happened to have "TRUE" in uppercase.
  • Counting - group by is_active and you get two rows: true and false. Not seven.
  • Logic - you can use it in conditions with other boolean columns, like finding users where is_active = true AND email_verified = false.

Handling multiple boolean columns

In the sample data, email_verified has the same problem. Repeat the Bulk Replace and Convert Type steps for that column. If you have five or six boolean columns in your dataset - which is common in user exports - you'll do this for each one.

Each pass adds two pipeline steps. For three boolean columns, that's six steps total. It takes about a minute. The alternative - writing the SQL by hand, testing it, handling edge cases - takes significantly longer.

Watch for edge cases

A few things to check after converting:

  • NULLs. If some rows had no value to begin with, they'll remain NULL after conversion. That's correct - a missing value isn't true or false, it's unknown.
  • Unexpected values. Check Column Explorer after the Bulk Replace. If you see any value besides "true" and "false", you missed a variant. Add it to the appropriate group.
  • Numeric booleans. "1" and "0" are common but so are "2" or "-1" in some systems. Decide if "2" means true or if it's an error.

Column Explorer for is_active after cleanup - exactly two values, clean and ready for analysis:

value count % of total
true8,34283.4%
false1,54115.4%
NULL1171.2%

Before: 7 distinct values for a yes/no field. After: 2 values plus NULL. Filters and group-bys now work correctly.

Fix your boolean columns →

Try it yourself

No sign-up, no upload, no tracking.

Open ExploreMyData