SQL UNION, INTERSECT, and EXCEPT
AI-Generated Content
SQL UNION, INTERSECT, and EXCEPT
In data science, raw information is rarely stored in a single, perfect table. You often need to merge customer lists from different regions, compare today's inventory against yesterday's, or find records present in one dataset but not another. Manually combining or comparing these result sets is inefficient and error-prone. This is where SQL's set operators—UNION, INTERSECT, and EXCEPT—become essential. They allow you to logically combine the results of two or more SELECT queries into a single result set, enabling powerful data consolidation, comparison, and cleaning operations directly within your database.
Foundational Concepts: The Rules of the Set
Before diving into each operator, you must understand the non-negotiable rules that govern all SQL set operations. These are critical for avoiding syntax errors and obtaining meaningful results.
The primary rule is column compatibility. The corresponding columns in each SELECT statement must be compatible in data type. This does not mean the column names must be identical, but the data types of the first column in Query 1, the first column in Query 2, and so on, must be matchable by the database. For example, you can typically combine a VARCHAR column with a TEXT column, but not with an INTEGER column. The final result set uses the column names from the first SELECT statement.
Secondly, the number of columns in each query must be identical. If your first query selects three columns (id, name, department), every subsequent query in the set operation must also select exactly three columns.
Finally, while you can apply ORDER BY to the entire combined result, it must appear only once at the very end of the entire statement. You cannot sort each individual query within the set operation independently.
Merging Results with UNION and UNION ALL
The UNION operator combines the results of two or more queries and returns only distinct rows, removing any duplicates from the final set. Think of it as asking for "all unique items from both lists."
-- Find all distinct cities where we have either customers or suppliers
SELECT city FROM customers
UNION
SELECT city FROM suppliers
ORDER BY city;In this example, if 'Chicago' appears in both the customers and suppliers tables, it will appear only once in the final output. The database performs a deduplication step, which has a computational cost.
When you need to preserve all rows, including duplicates, you use UNION ALL. This operator simply concatenates the result sets. It is significantly faster than UNION because it skips the deduplication process. Use UNION ALL when you know the source queries are disjointed or when duplicates are meaningful for your analysis (e.g., counting total transactions from multiple days).
-- Combine all daily sales records for a weekly report (duplicates are actual separate sales)
SELECT sale_id, sale_date, amount FROM sales_january
UNION ALL
SELECT sale_id, sale_date, amount FROM sales_february;A key practical application is consolidating partitioned or historical data. For instance, you might have monthly sales tables; UNION ALL is the correct tool to stitch them together for a quarterly or yearly view.
Finding Common Rows with INTERSECT
The INTERSECT operator returns only the rows that are present in both result sets. It finds the logical overlap or commonality. This is incredibly useful for data validation and finding shared attributes.
-- Find products that have been both ordered and restocked this week
SELECT product_id FROM orders_this_week
INTERSECT
SELECT product_id FROM restocks_this_week;This query will return only those product_id values that exist in both the orders_this_week and restocks_this_week result sets. Each row in the final output is distinct. A common data science use case is identifying users who performed two specific actions, or items that appear on two different recommended lists. It provides a precise method for finding intersections between datasets.
Identifying Differences with EXCEPT (or MINUS)
The EXCEPT operator (called MINUS in some database systems like Oracle) returns rows from the first query result that are not present in the second query result. It performs a set difference operation. The order of queries matters profoundly: Query1 EXCEPT Query2 is not the same as Query2 EXCEPT Query1.
-- Find customers who have made inquiries but have never placed an order
SELECT customer_id FROM inquiries
EXCEPT
SELECT customer_id FROM orders;This query tells you who is interested but hasn't converted, a key business insight. Conversely, flipping the queries would show you customers who have ordered but never made an inquiry (perhaps they ordered as guests). Another critical application is in data cleaning and change detection. You can compare yesterday's full dataset with today's to find which records have been deleted (YESTERDAY EXCEPT TODAY) or which are new (TODAY EXCEPT YESTERDAY).
Common Pitfalls
- Ignoring the Performance Cost of UNION vs. UNION ALL: A frequent mistake is using
UNIONas a default whenUNION ALLis actually needed. If you are certain there are no duplicates, or if duplicates should be counted, usingUNIONforces the database to perform an expensive sort/distinct operation for no benefit. Always ask: "Do I need to remove duplicates?" If the answer is no, useUNION ALL.
- Mismatched Column Counts or Data Types: This is the most common syntax error. Writing
SELECT id, name FROM table_a UNION SELECT id FROM table_bwill fail because the column counts differ. Similarly,SELECT text_col FROM table_a UNION SELECT numeric_col FROM table_bwill fail due to incompatible types. Always double-check that eachSELECTstatement projects the same number of columns in compatible data types.
- Misunderstanding EXCEPT Order: Treating
EXCEPTas a commutative operation (where order doesn't matter) leads to incorrect logic.A EXCEPT Bfinds what's in A but not B.B EXCEPT Afinds what's in B but not A. These answer fundamentally different questions. Always consciously define which set is your starting pool and which is your subtracting filter.
- Applying ORDER BY to Individual Queries: You cannot write
SELECT ... ORDER BY ... UNION SELECT .... TheORDER BYclause can only appear once at the very end of the entire compound query, and it sorts the final, combined result set. Place yourORDER BYlast, referencing the column names/aliases from the firstSELECT.
Summary
- UNION merges distinct rows from multiple queries, while UNION ALL merges all rows (including duplicates) and is more performant when deduplication is unnecessary.
- INTERSECT returns only the rows that are common to all involved query result sets, ideal for finding overlaps or shared records.
- EXCEPT (or MINUS) returns rows from the first result set that are not in the second, making it perfect for identifying differences, deletions, or missing data.
- All set operators require strict column compatibility: each query must have the same number of columns, and corresponding columns must have compatible data types.
- The final combined result set can be ordered using a single
ORDER BYclause placed at the end of the entire statement.