SQL Output of department_new table showing six entries including REHA
SQL Output of department_new table showing six entries including REHA

How to Compare Data in Two SQL Tables Using SQL Queries

Structured Query Language (SQL) is essential for managing and manipulating data in relational databases. A common task for database administrators and developers is to compare data between two tables, often to identify discrepancies, track changes, or ensure data consistency. This article will guide you through a practical method to compare data in two SQL tables and pinpoint the differences using SQL queries.

We will demonstrate this by creating a sample database and two tables, department_old and department_new, within that database. This setup will allow us to illustrate the SQL commands and techniques effectively.

Setting Up the Database and Tables

First, let’s create a database named geeks. This will be our working environment for demonstrating the table comparison process.

CREATE DATABASE geeks;

Next, we need to instruct SQL to use this newly created database for subsequent operations.

USE geeks;

Now, let’s define the structure for our first table, department_old. This table will represent the initial state of department data.

CREATE TABLE department_old(
    ID int,
    SALARY int,
    NAME Varchar(20),
    DEPT_ID Varchar(255)
);

After creating the table structure, we’ll populate department_old with sample data. This data represents the original set of department records.

INSERT INTO department_old VALUES (1, 34000, 'ANURAG', 'UI DEVELOPERS');
INSERT INTO department_old VALUES (2, 33000, 'HARSH', 'BACKEND DEVELOPERS');
INSERT INTO department_old VALUES (3, 36000, 'SUMIT', 'BACKEND DEVELOPERS');
INSERT INTO department_old VALUES (4, 36000, 'RUHI', 'UI DEVELOPERS');
INSERT INTO department_old VALUES (5, 37000, 'KAE', 'UI DEVELOPERS');

To verify the data insertion and examine the contents of department_old, we can use a simple SELECT statement.

SELECT * FROM department_old;

This query will return the following result, showing the initial data in our department_old table:

ID SALARY NAME DEPT_ID
1 34000 ANURAG UI DEVELOPERS
2 33000 HARSH BACKEND DEVELOPERS
3 36000 SUMIT BACKEND DEVELOPERS
4 36000 RUHI UI DEVELOPERS
5 37000 KAE UI DEVELOPERS

Now, let’s create the second table, department_new. This table will represent a potentially updated version of the department data. The structure is identical to department_old.

CREATE TABLE department_new(
    ID int,
    SALARY int,
    NAME Varchar(20),
    DEPT_ID Varchar(255)
);

We then insert data into department_new. Notice that we’ve added a new record and kept most of the original records, simulating an update or addition to the department data.

INSERT INTO department_new VALUES (1, 34000, 'ANURAG', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (2, 33000, 'HARSH', 'BACKEND DEVELOPERS');
INSERT INTO department_new VALUES (3, 36000, 'SUMIT', 'BACKEND DEVELOPERS');
INSERT INTO department_new VALUES (4, 36000, 'RUHI', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (5, 37000, 'KAE', 'UI DEVELOPERS');
INSERT INTO department_new VALUES (6, 37000, 'REHA', 'BACKEND DEVELOPERS');

To confirm the data in department_new, we use the SELECT statement again:

SELECT * FROM department_new;

This will display the contents of department_new, showing the new record added:

ID SALARY NAME DEPT_ID
1 34000 ANURAG UI DEVELOPERS
2 33000 HARSH BACKEND DEVELOPERS
3 36000 SUMIT BACKEND DEVELOPERS
4 36000 RUHI UI DEVELOPERS
5 37000 KAE UI DEVELOPERS
6 37000 REHA BACKEND DEVELOPERS

Output:

SQL Output of department_new table showing six entries including REHASQL Output of department_new table showing six entries including REHA

Identifying Differences Between Tables

Now, to the core of our task: comparing department_old and department_new to find differences. We can achieve this using UNION ALL in combination with GROUP BY and COUNT(*).

The UNION ALL operator combines the result sets of two or more SELECT statements. Critically, UNION ALL includes all rows from each table, even duplicates. This is important for our comparison.

We then use GROUP BY to group the combined rows based on the columns we want to compare. In our case, we want to compare all columns (ID, SALARY, NAME, DEPT_ID).

Finally, COUNT(*) within each group tells us how many times each unique row appears in the combined result set. If a row exists in both tables, it will appear twice (count = 2). If a row exists in only one table, it will appear once (count = 1). Therefore, rows with a COUNT(*) of 1 represent the differences between the two tables.

Here’s the SQL query to find the differing rows:

SELECT ID, SALARY, NAME, DEPT_ID
FROM (
    SELECT ID, SALARY, NAME, DEPT_ID FROM department_old
    UNION ALL
    SELECT ID, SALARY, NAME, DEPT_ID FROM department_new
) AS combined_tables
GROUP BY ID, SALARY, NAME, DEPT_ID
HAVING COUNT(*) = 1;

In this query:

  • We use subqueries to select all columns from both department_old and department_new.
  • UNION ALL combines these results.
  • We use GROUP BY ID, SALARY, NAME, DEPT_ID to group identical rows from the combined result.
  • HAVING COUNT(*) = 1 filters the groups, showing only those rows that appear exactly once in the combined set, which are the rows unique to either department_old or department_new.

Output:

Executing this query against our geeks database will yield the following result:

(No output is shown because in the original article example, the query to find differences by ID only returns no rows, but for comparing entire rows as in this rewritten article, it should return the row with ‘REHA’.)

In our example, if we were to compare based on all columns as shown in the query above, the output would show the row with ‘REHA’ because it exists only in department_new. If there were rows missing from department_new compared to department_old, those missing rows (from department_old) would also appear in the output.

If the tables were identical in terms of the columns specified in the GROUP BY clause, the query would return no rows, indicating no differences were found based on the compared columns.

This method provides a straightforward way to identify differences between two SQL tables by comparing row data. It’s particularly useful for auditing data changes and ensuring data consistency across different versions of tables.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *