In the realm of Linux system administration and development, comparing files is a routine yet critical task. Whether you are tracking configuration changes, analyzing log files, or managing datasets, the ability to efficiently compare file content is indispensable. This article delves into colcmp.sh
, a bash script designed to compare two files specifically for changes in name-value pairs, offering a focused approach to Linux Compare 2 Files. This script is particularly useful when you need to identify modifications in configuration files or data lists where each entry consists of a name and its corresponding value.
Understanding colcmp.sh
colcmp.sh
is a command-line utility crafted in bash to pinpoint changes between two files formatted as name value
pairs, each on a new line. It leverages the power of bash associative arrays (available in bash version 4 and above) to efficiently process and compare these files. The script is designed to output the names of entries that have been modified, added, or removed between the two input files.
Key Features and Benefits
- Focused Comparison: Unlike generic file comparison tools like
diff
orcmp
,colcmp.sh
is tailored for name-value pair comparisons. This specialization allows it to provide more meaningful output in scenarios where the order of lines is not significant, but the association between names and values is crucial. - Change Detection: The script not only identifies differences but categorizes them as changes, additions, or removals. This detailed output makes it easy to understand the nature of modifications between file versions.
- Bash Script Simplicity: Being a bash script,
colcmp.sh
is portable across Linux environments and can be easily understood, modified, and integrated into larger shell scripts or automation workflows. - Clear Output: The script provides human-readable output, indicating which names have changed and summarizing the overall comparison. It also generates an
Output_File
containing a concise summary of changes.
How to Use colcmp.sh
for Linux File Comparison
To effectively use colcmp.sh
for linux compare 2 files, follow these steps:
Prerequisites
- Bash v4+: Ensure your Linux environment is running bash version 4 or higher. You can check your bash version by running
bash --version
in the terminal. - Executable Script: Save the script content (provided in the Source (colcmp.sh) section below) into a file named
colcmp.sh
, and make it executable using the commandchmod +x colcmp.sh
. - Input Files: Prepare two text files (
File_1.txt
andFile_2.txt
) that you want to compare. Each file should contain name-value pairs in the formatname value
, with each pair on a new line.
Running the Script
Execute colcmp.sh
from your terminal with the following syntax:
./colcmp.sh File_1.txt File_2.txt
Replace File_1.txt
and File_2.txt
with the actual paths to your input files.
Interpreting the Output
After running the script, you will see output directly in your terminal and an Output_File
will be created (or overwritten) in the same directory.
Terminal Output:
The terminal output provides a summary of the comparison:
- Files are identical: If
File_1.txt
andFile_2.txt
are identical, the output will be:files are identical
. - Changes detected: If differences are found, the output will list:
- Changes in values: e.g.,
User3 changed from 'US' to 'NG'
- Users removed from
File_2.txt
compared toFile_1.txt
: e.g.,User4 was removed
- Users added to
File_2.txt
compared toFile_1.txt
: e.g.,User5 was added as 'CA'
- Users with no changes: e.g.,
no change: User1,User2
- Changes in values: e.g.,
Output_File Content:
The Output_File
will contain a single line:
- If files are identical or no changes in name-value pairs are detected that are considered as ‘changed’ in the context of the script’s logic (additions, removals, value modifications),
Output_File
will be empty. - If any changes (additions, removals, or value modifications) are detected,
Output_File
will contain:UserX has changed
(whereUserX
is one of the users that has changed, added, or removed – the script logic makes it so that in case of changes, it will always write to this file).
Example:
Let’s assume File_1.txt
contains:
User1 US
User2 CA
User3 US
User4 UK
And File_2.txt
contains:
User1 US
User2 CA
User3 NG
User5 CA
Running ./colcmp.sh File_1.txt File_2.txt
will produce the following terminal output:
User3 changed from 'US' to 'NG'
User4 was removed
User5 was added as 'CA'
no change: User1,User2
And Output_File
will contain:
User3 has changed
Alt text: Command-line example showing the execution of colcmp.sh comparing File_1.txt and File_2.txt and displaying the terminal output with changes and no changes, alongside the content of Output_File.
Deep Dive into colcmp.sh
Script Logic
To fully appreciate the functionality of colcmp.sh
for linux compare 2 files, let’s break down its source code step by step.
Basic File Comparison and Initial Setup
cmp -s "$1" "$2"
case "$?" in
0)
echo "" > Output_File
echo "files are identical"
;;
1)
# Compare logic
;;
*)
echo "error: file not found, access denied, etc..."
echo "usage: ./colcmp.sh File_1.txt File_2.txt"
;;
esac
This section starts by using the cmp -s
command to perform a silent byte-by-byte comparison of the two input files ($1
and $2
). The cmp
command sets the exit status $?
based on the comparison result:
- 0: Files are identical. In this case, the script clears
Output_File
and outputs “files are identical”. - 1: Files differ. This triggers the main comparison logic to identify name-value pair changes.
- 2 (or any other non-zero, non-one): An error occurred (e.g., file not found). The script outputs an error message and usage instructions.
The case
statement is used to handle these different exit statuses gracefully.
Processing File 1 into an Associative Array A1
echo "" > Output_File
cp "$1" ~/.colcmp.array1.tmp.sh
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh
chmod 755 ~/.colcmp.array1.tmp.sh
declare -A A1
source ~/.colcmp.array1.tmp.sh
This block processes the first input file ($1
) to create a bash associative array named A1
.
- Clear Output and Copy File:
echo "" > Output_File
clears the output file, andcp "$1" ~/.colcmp.array1.tmp.sh
copiesFile_1.txt
to a temporary script file in the user’s home directory. - Escape Special Characters:
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh
escapes special characters in the temporary file usingsed
and regular expressions. This is a safety measure to prevent unintended command execution when the file content is later sourced. - Comment Out All Lines:
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh
comments out every line in the temporary file by prepending#
. This ensures that the lines are treated as comments initially and not executed directly when sourced. - Convert to Array Assignments:
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh
is the core transformation step. It usessed
to find lines that start with a comment (#
), followed by optional whitespace, then captures the first word as the ‘name’ (captured in1
) and the rest of the line as the ‘value’ (captured in2
). It then replaces each such line with a bash command to assign the value to the associative arrayA1
using the name as the key:A1[name]="value"
. - Make Executable (Potentially Redundant):
chmod 755 ~/.colcmp.array1.tmp.sh
makes the temporary script executable. While not strictly necessary forsource
, it might have been included as a precautionary measure. - Declare Associative Array:
declare -A A1
explicitly declaresA1
as an associative array, which is essential for using string keys. - Source the Script:
source ~/.colcmp.array1.tmp.sh
executes the temporary script in the current shell environment. This effectively populates the associative arrayA1
with name-value pairs fromFile_1.txt
.
Processing File 2 into Associative Array A2
The script repeats the same process for the second input file ($2
) to create another associative array named A2
, storing its name-value pairs.
cp "$2" ~/.colcmp.array2.tmp.sh
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh
chmod 755 ~/.colcmp.array2.tmp.sh
declare -A A2
source ~/.colcmp.array2.tmp.sh
This section mirrors the previous one, but operates on File_2.txt
and populates the associative array A2
.
Comparing Arrays and Identifying Changes
USERSWHODIDNOTCHANGE=
for i in "${!A1[@]}"; do
if [ "${A2[$i]+x}" = "" ]; then
echo "$i was removed"
echo "$i has changed" > Output_File
fi
done
for i in "${!A2[@]}"; do
if [ "${A1[$i]+x}" = "" ]; then
echo "$i was added as '${A2[$i]}'"
echo "$i has changed" > Output_File
elif [ "${A1[$i]}" != "${A2[$i]}" ]; then
echo "$i changed from '${A1[$i]}' to '${A2[$i]}'"
echo "$i has changed" > Output_File
else
if [ x$USERSWHODIDNOTCHANGE != x ]; then
USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"
fi
USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"
fi
done
if [ x$USERSWHODIDNOTCHANGE != x ]; then
echo "no change: $USERSWHODIDNOTCHANGE"
fi
This crucial part of the script compares the two associative arrays A1
and A2
to detect changes:
- Initialize
USERSWHODIDNOTCHANGE
:USERSWHODIDNOTCHANGE=
initializes an empty variable to store names that have not changed. - Detect Removals: The first
for
loop iterates through the keys of arrayA1
(names fromFile_1.txt
). For each name$i
, it checks if the key exists inA2
using[ "${A2[$i]+x}" = "" ]
. If the key is not present inA2
, it means the name (and its corresponding entry) was removed inFile_2.txt
. The script then outputs$i was removed
and writes toOutput_File
. - Detect Additions, Modifications, and No Changes: The second
for
loop iterates through the keys of arrayA2
(names fromFile_2.txt
). For each name$i
:- Addition: It checks if the key
$i
exists inA1
using[ "${A1[$i]+x}" = "" ]
. If not present inA1
, it’s a new entry added inFile_2.txt
. The script outputs$i was added as '${A2[$i]}'
and writes toOutput_File
. - Modification: If the key exists in both
A1
andA2
, it compares the values using[ "${A1[$i]}" != "${A2[$i]}" ]
. If the values are different, it indicates a modification. The script outputs$i changed from '${A1[$i]}' to '${A2[$i]}'
and writes toOutput_File
. - No Change: If the key and value are the same in both arrays, it means no change for this name. The script appends the name
$i
to theUSERSWHODIDNOTCHANGE
variable to build a comma-separated list of unchanged names.
- Addition: It checks if the key
- Output Unchanged Users: Finally, if
USERSWHODIDNOTCHANGE
is not empty, the script outputsno change: $USERSWHODIDNOTCHANGE
.
Enhancements and Alternatives for File Comparison in Linux
While colcmp.sh
provides a specialized solution for comparing name-value pairs, it’s beneficial to consider enhancements and alternative Linux commands for broader file comparison needs.
Potential Enhancements to colcmp.sh
- Error Handling: Improve error handling for cases like incorrect file formats, missing files, or permission issues.
- More Robust Input Validation: Add checks to ensure input files adhere to the
name value
format. - Function Refactoring: Encapsulate the array creation logic into functions to reduce code duplication and improve script readability.
- Output Formatting Options: Allow users to customize the output format, perhaps with command-line options to control the verbosity or output delimiters.
- Ignoring Case or Whitespace: Implement options to ignore case differences or leading/trailing whitespace in name or value comparisons.
Alternative Linux Commands for File Comparison
For more general linux compare 2 files tasks, consider these standard command-line utilities:
diff
: A powerful tool for finding line-by-line differences between files. It’s highly versatile and offers various output formats (e.g., unified diff, context diff) suitable for patching and code review.diff File_1.txt File_2.txt
comm
: Compares two sorted files and outputs lines unique to each file and lines common to both. Useful for identifying common and distinct entries in lists.comm File_1.txt File_2.txt
(requires sorted input files)cmp
: Performs byte-by-byte comparison and is very efficient for quickly checking if two files are identical.cmp File_1.txt File_2.txt
(as used incolcmp.sh
for initial file identity check)vimdiff
(orgvimdiff
): A graphical file comparison tool using the Vim text editor. It provides a visual side-by-side diff view, making it easy to spot and navigate differences, especially in code or structured text files.vimdiff File_1.txt File_2.txt
Alt text: Screenshot of vimdiff showing a side-by-side visual comparison of two text files, highlighting the differences between them in a graphical interface.
Conclusion
colcmp.sh
offers a specialized and effective way to linux compare 2 files when dealing with name-value pair formatted data. Its bash script nature, combined with the use of associative arrays, provides a clear and concise solution for detecting changes in configuration files, user lists, or similar datasets. While standard tools like diff
, comm
, and cmp
offer broader file comparison capabilities, colcmp.sh
excels in its niche, providing targeted insights into modifications within name-value structures. For more complex or visual comparisons, tools like vimdiff
can complement command-line utilities, offering a comprehensive toolkit for file comparison in Linux environments.
Source (colcmp.sh)
cmp -s "$1" "$2"
case "$?" in
0)
echo "" > Output_File
echo "files are identical"
;;
1)
echo "" > Output_File
cp "$1" ~/.colcmp.array1.tmp.sh
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh
chmod 755 ~/.colcmp.array1.tmp.sh
declare -A A1
source ~/.colcmp.array1.tmp.sh
cp "$2" ~/.colcmp.array2.tmp.sh
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh
chmod 755 ~/.colcmp.array2.tmp.sh
declare -A A2
source ~/.colcmp.array2.tmp.sh
USERSWHODIDNOTCHANGE=
for i in "${!A1[@]}"; do
if [ "${A2[$i]+x}" = "" ]; then
echo "$i was removed"
echo "$i has changed" > Output_File
fi
done
for i in "${!A2[@]}"; do
if [ "${A1[$i]+x}" = "" ]; then
echo "$i was added as '${A2[$i]}'"
echo "$i has changed" > Output_File
elif [ "${A1[$i]}" != "${A2[$i]}" ]; then
echo "$i changed from '${A1[$i]}' to '${A2[$i]}'"
echo "$i has changed" > Output_File
else
if [ x$USERSWHODIDNOTCHANGE != x ]; then
USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"
fi
USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"
fi
done
if [ x$USERSWHODIDNOTCHANGE != x ]; then
echo "no change: $USERSWHODIDNOTCHANGE"
fi
;;
*)
echo "error: file not found, access denied, etc..."
echo "usage: ./colcmp.sh File_1.txt File_2.txt"
;;
esac