Structured Query Language (SQL) is essential for managing and manipulating data in relational databases. A common task for database administrators and developers is to compare data between two tables, often to identify discrepancies, track changes, or ensure data consistency. This article will guide you through a practical method to compare data in two SQL tables and pinpoint the differences using SQL queries.
We will demonstrate this by creating a sample database and two tables, department_old
and department_new
, within that database. This setup will allow us to illustrate the SQL commands and techniques effectively.
Setting Up the Database and Tables
First, let’s create a database named geeks
. This will be our working environment for demonstrating the table comparison process.
CREATE DATABASE geeks;
Next, we need to instruct SQL to use this newly created database for subsequent operations.
USE geeks;
Now, let’s define the structure for our first table, department_old
. This table will represent the initial state of department data.
CREATE TABLE department_old(
ID int,
SALARY int,
NAME Varchar(20),
DEPT_ID Varchar(255)
);
After creating the table structure, we’ll populate department_old
with sample data. This data represents the original set of department records.
INSERT INTO department_old VALUES (1, 34000, 'ANURAG', 'UI DEVELOPERS');
INSERT INTO department_old VALUES (2, 33000, 'HARSH', 'BACKEND DEVELOPERS');
INSERT INTO department_old VALUES (3, 36000, 'SUMIT', 'BACKEND DEVELOPERS');
INSERT INTO department_old VALUES (4, 36000, 'RUHI', 'UI DEVELOPERS');
INSERT INTO department_old VALUES (5, 37000, 'KAE', 'UI DEVELOPERS');
To verify the data insertion and examine the contents of department_old
, we can use a simple SELECT
statement.
SELECT * FROM department_old;
This query will return the following result, showing the initial data in our department_old
table:
ID | SALARY | NAME | DEPT_ID |
---|---|---|---|
1 | 34000 | ANURAG | UI DEVELOPERS |
2 | 33000 | HARSH | BACKEND DEVELOPERS |
3 | 36000 | SUMIT | BACKEND DEVELOPERS |
4 | 36000 | RUHI | UI DEVELOPERS |
5 | 37000 | KAE | UI DEVELOPERS |
Now, let’s create the second table, department_new
. This table will represent a potentially updated version of the department data. The structure is identical to department_old
.
CREATE TABLE department_new(
ID int,
SALARY int,
NAME Varchar(20),
DEPT_ID Varchar(255)
);
We then insert data into department_new
. Notice that we’ve added a new record and kept most of the original records, simulating an update or addition to the department data.
INSERT INTO department_new VALUES (1, 34000, 'ANURAG', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (2, 33000, 'HARSH', 'BACKEND DEVELOPERS');
INSERT INTO department_new VALUES (3, 36000, 'SUMIT', 'BACKEND DEVELOPERS');
INSERT INTO department_new VALUES (4, 36000, 'RUHI', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (5, 37000, 'KAE', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (6, 37000, 'REHA', 'BACKEND DEVELOPERS');
To confirm the data in department_new
, we use the SELECT
statement again:
SELECT * FROM department_new;
This will display the contents of department_new
, showing the new record added:
ID | SALARY | NAME | DEPT_ID |
---|---|---|---|
1 | 34000 | ANURAG | UI DEVELOPERS |
2 | 33000 | HARSH | BACKEND DEVELOPERS |
3 | 36000 | SUMIT | BACKEND DEVELOPERS |
4 | 36000 | RUHI | UI DEVELOPERS |
5 | 37000 | KAE | UI DEVELOPERS |
6 | 37000 | REHA | BACKEND DEVELOPERS |
Output:
SQL Output of department_new table showing six entries including REHA
Identifying Differences Between Tables
Now, to the core of our task: comparing department_old
and department_new
to find differences. We can achieve this using UNION ALL
in combination with GROUP BY
and COUNT(*)
.
The UNION ALL
operator combines the result sets of two or more SELECT
statements. Critically, UNION ALL
includes all rows from each table, even duplicates. This is important for our comparison.
We then use GROUP BY
to group the combined rows based on the columns we want to compare. In our case, we want to compare all columns (ID
, SALARY
, NAME
, DEPT_ID
).
Finally, COUNT(*)
within each group tells us how many times each unique row appears in the combined result set. If a row exists in both tables, it will appear twice (count = 2). If a row exists in only one table, it will appear once (count = 1). Therefore, rows with a COUNT(*)
of 1 represent the differences between the two tables.
Here’s the SQL query to find the differing rows:
SELECT ID, SALARY, NAME, DEPT_ID
FROM (
SELECT ID, SALARY, NAME, DEPT_ID FROM department_old
UNION ALL
SELECT ID, SALARY, NAME, DEPT_ID FROM department_new
) AS combined_tables
GROUP BY ID, SALARY, NAME, DEPT_ID
HAVING COUNT(*) = 1;
In this query:
- We use subqueries to select all columns from both
department_old
anddepartment_new
. UNION ALL
combines these results.- We use
GROUP BY ID, SALARY, NAME, DEPT_ID
to group identical rows from the combined result. HAVING COUNT(*) = 1
filters the groups, showing only those rows that appear exactly once in the combined set, which are the rows unique to eitherdepartment_old
ordepartment_new
.
Output:
Executing this query against our geeks
database will yield the following result:
(No output is shown because in the original article example, the query to find differences by ID
only returns no rows, but for comparing entire rows as in this rewritten article, it should return the row with ‘REHA’.)
In our example, if we were to compare based on all columns as shown in the query above, the output would show the row with ‘REHA’ because it exists only in department_new
. If there were rows missing from department_new
compared to department_old
, those missing rows (from department_old
) would also appear in the output.
If the tables were identical in terms of the columns specified in the GROUP BY
clause, the query would return no rows, indicating no differences were found based on the compared columns.
This method provides a straightforward way to identify differences between two SQL tables by comparing row data. It’s particularly useful for auditing data changes and ensuring data consistency across different versions of tables.