How to Compare Two Tables in SQL

Comparing data across tables is a fundamental task in SQL. This article provides a comprehensive guide on how to compare and identify differences between two tables using SQL queries. We will explore various techniques, including using UNION ALL, JOIN, and EXCEPT (or MINUS in some databases).

Understanding the Problem

Comparing two tables involves identifying rows that exist in one table but not the other, as well as finding rows that are present in both tables but have different values in specific columns. This is crucial for data validation, synchronization, and auditing purposes.

Methods for Comparing Tables

Using UNION ALL and GROUP BY

This method leverages the UNION ALL operator to combine the data from both tables and then uses GROUP BY and HAVING to identify discrepancies.

Combine Data: UNION ALL combines all rows from both tables into a single result set. Duplicate rows are not eliminated.
Group and Count: GROUP BY groups the rows based on all columns you want to compare. COUNT(*) counts the occurrences of each unique combination of values.
Filter Discrepancies: The HAVING clause filters the grouped results, keeping only those groups where COUNT(*) is equal to 1. This indicates that the row exists in only one of the two tables.

Example: Let’s say we have two tables, employees1 and employees2, with columns id, name, and salary.

SELECT id, name, salary
FROM (
    SELECT id, name, salary FROM employees1
    UNION ALL
    SELECT id, name, salary FROM employees2
) AS combined_employees
GROUP BY id, name, salary
HAVING COUNT(*) = 1;

This query returns rows that are present in either employees1 or employees2, but not both. It also highlights rows that exist in both tables but have differing values in any of the compared columns.

Using JOINs

JOIN clauses offer another approach for comparing tables. A FULL OUTER JOIN is particularly useful as it returns all rows from both tables.

Example: Using the same employees1 and employees2 tables:

SELECT 
    e1.id AS e1_id, e1.name AS e1_name, e1.salary AS e1_salary,
    e2.id AS e2_id, e2.name AS e2_name, e2.salary AS e2_salary
FROM employees1 e1
FULL OUTER JOIN employees2 e2 ON e1.id = e2.id
WHERE e1.id IS NULL OR e2.id IS NULL
   OR e1.name <> e2.name
   OR e1.salary <> e2.salary;

This query returns rows where the id is missing in one of the tables or where the name or salary values differ between matching ids. Replace FULL OUTER JOIN with LEFT JOIN or RIGHT JOIN to compare based on rows present in the left or right table respectively.

Using EXCEPT or MINUS

EXCEPT (in SQL Server) or MINUS (in Oracle and other databases) returns all rows in the first SELECT statement that are not present in the second SELECT statement.

Example:

-- SQL Server
SELECT id, name, salary FROM employees1
EXCEPT
SELECT id, name, salary FROM employees2;

-- Oracle
SELECT id, name, salary FROM employees1
MINUS
SELECT id, name, salary FROM employees2;

This returns rows that exist in employees1 but not in employees2. To find rows that exist in employees2 but not employees1, simply reverse the order of the SELECT statements. Note that EXCEPT and MINUS compare all selected columns for equality.

Choosing the Right Method

The optimal method depends on the specific comparison requirements:

UNION ALL with GROUP BY is versatile for finding all discrepancies.
JOINs provide more flexibility in handling matching and non-matching rows.
EXCEPT/MINUS is concise for finding rows unique to one table.

Conclusion

Comparing two tables in SQL is essential for data management. By understanding these techniques using UNION ALL, JOIN, and EXCEPT/MINUS, you can effectively identify and address data differences across your database tables. Selecting the appropriate method ensures efficient and accurate comparison results tailored to your specific needs.