In Linux environments, comparing files is a common and crucial task for system administrators, developers, and anyone managing data. Whether you’re tracking changes in configuration files, verifying data integrity, or debugging scripts, knowing how to effectively compare files from the command line is essential. This article delves into a practical bash script, colcmp.sh
, designed to compare two files containing name/value pairs, highlighting the differences and changes between them.
What is colcmp.sh
and Why Use It?
The colcmp.sh
script is a command-line tool specifically crafted for comparing files formatted as name value
pairs, where each line consists of a name and its corresponding value. This format is frequently used in configuration files, data lists, and various other text-based data storage. Unlike generic file comparison tools like diff
which show line-by-line differences, colcmp.sh
focuses on the values associated with each name. This makes it particularly useful for identifying changes in configurations or data where the order of entries might vary, but you are primarily interested in whether the value associated with a specific name has changed.
Key benefits of using colcmp.sh
include:
- Focused comparison: It specifically targets name/value pairs, making it ideal for configuration files and structured data.
- Change detection: Clearly identifies names with changed values, added names, and removed names between two files.
- Output to file: Writes changed names to an
Output_File
, providing a concise list of modifications. - Command-line efficiency: As a bash script, it’s lightweight, fast, and integrates seamlessly into Linux command-line workflows.
How to Use colcmp.sh
To utilize colcmp.sh
, you need to have the script saved in your system and have execute permissions. Here’s a step-by-step guide on how to use it:
-
Save the script: Copy the script code provided below into a file named
colcmp.sh
. -
Set execute permissions: Open your terminal and navigate to the directory where you saved
colcmp.sh
. Run the commandchmod +x colcmp.sh
to make the script executable. -
Run the script: Execute the script with two file paths as arguments, representing the two files you want to compare:
./colcmp.sh File_1.txt File_2.txt
Replace
File_1.txt
andFile_2.txt
with the actual paths to your files.
Example:
Let’s assume you have two files, config_v1.txt
and config_v2.txt
, representing two versions of a configuration file.
File_1.txt (config_v1.txt):
User1 US
User2 US
User3 US
SettingA on
SettingB off
File_2.txt (config_v2.txt):
User1 US
User2 US
User3 NG
SettingA off
SettingC new_setting
Running the script:
$ ./colcmp.sh config_v1.txt config_v2.txt
User3 changed from 'US' to 'NG'
SettingA changed from 'on' to 'off'
User4 added as 'newValue'
no change: User1,User2,SettingB
Output_File (Output_File):
After running the script, an Output_File
will be created (or overwritten) in the same directory containing the names of the entries that have changed.
$ cat Output_File
User3 has changed
SettingA has changed
User4 has changed
This output clearly shows that User3
and SettingA
have changed their values, and User4
was added. The script also informs you about entries that remained unchanged.
Understanding the colcmp.sh
Script
The colcmp.sh
script leverages bash associative arrays (available in bash version 4 and later) to efficiently compare the name/value pairs. Here’s a breakdown of the script’s logic:
1. Basic File Comparison and Initial Setup
cmp -s "$1" "$2"
case "$?" in
0)
echo "" > Output_File
echo "files are identical" ;;
1)
echo "" > Output_File
# ... rest of the script ...
;;
*)
echo "error: file not found, access denied, etc..."
echo "usage: ./colcmp.sh File_1.txt File_2.txt" ;;
esac
cmp -s "$1" "$2"
: This command silently compares the two input files ($1
and$2
). The-s
option preventscmp
from writing any output to standard output.case "$?" in ... esac
: This structure evaluates the exit status$?
of thecmp
command.0)
: Ifcmp
returns 0 (files are identical), it clearsOutput_File
and prints “files are identical”.1)
: Ifcmp
returns 1 (files differ), it proceeds with the detailed comparison logic.- *`)`:** For any other exit status (usually 2, indicating an error like file not found), it prints an error message and usage instructions.
2. Processing Files into Associative Arrays
The script processes each input file to create bash associative arrays (A1
and A2
). This involves several steps for each file:
cp "$1" ~/.colcmp.array1.tmp.sh
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh
chmod 755 ~/.colcmp.array1.tmp.sh
declare -A A1
source ~/.colcmp.array1.tmp.sh
These lines are repeated for the second file, creating A2
from $2
. Let’s break down what happens for each file (using File_1.txt
and A1
as example):
cp "$1" ~/.colcmp.array1.tmp.sh
: Copies the input file (File_1.txt
) to a temporary script file in the user’s home directory (~/.colcmp.array1.tmp.sh
).sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh
: Thissed
command escapes special characters in the values. It finds any character that is not alphanumeric or a space and prefixes it with a backslash (). This is crucial to prevent misinterpretation of special characters when the file is later sourced as a script.
- *`sed -i -E “s/^(.)$/#1/” ~/.colcmp.array1.tmp.sh
:** Comments out every line in the temporary script file by adding a
#` at the beginning of each line. This is a safety measure to prevent accidental execution of any code within the input files. - *`sed -i -E “s/^#s(S+)s+(S.?)s$/A1[1]=”2″/” ~/.colcmp.array1.tmp.sh
:** This is the core transformation step. It finds lines that are commented out (starting with
#), followed by optional whitespace, then captures the first non-whitespace word as the *name* (
1) and the rest of the line as the *value* (
2). It then replaces the entire line with a bash associative array assignment:
A1[name]=”value”`. chmod 755 ~/.colcmp.array1.tmp.sh
: Makes the temporary script file executable. While not strictly necessary forsource
, it’s a common practice when dealing with script files.declare -A A1
: DeclaresA1
as an associative array. This is essential for using name-value pairs as keys and values.source ~/.colcmp.array1.tmp.sh
: Executes the temporary script in the current shell. This runs all the array assignment commands within the temporary script, populating the associative arrayA1
with name-value pairs fromFile_1.txt
.
3. Detecting Changes and Generating Output
After creating associative arrays A1
and A2
from both input files, the script proceeds to compare them and identify differences:
USERSWHODIDNOTCHANGE=
for i in "${!A1[@]}"; do
if [ "${A2[$i]+x}" = "" ]; then
echo "$i was removed"
echo "$i has changed" > Output_File
fi
done
for i in "${!A2[@]}"; do
if [ "${A1[$i]+x}" = "" ]; then
echo "$i was added as '${A2[$i]}'"
echo "$i has changed" > Output_File
elif [ "${A1[$i]}" != "${A2[$i]}" ]; then
echo "$i changed from '${A1[$i]}' to '${A2[$i]}'"
echo "$i has changed" > Output_File
else
if [ x$USERSWHODIDNOTCHANGE != x ]; then
USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"
fi
USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"
fi
done
if [ x$USERSWHODIDNOTCHANGE != x ]; then
echo "no change: $USERSWHODIDNOTCHANGE"
fi
USERSWHODIDNOTCHANGE=
: Initializes an empty variable to store names that have not changed.- First
for
loop (Iterating through keys ofA1
):for i in "${!A1[@]}"; do ... done
: Iterates through all the names (keys) in the associative arrayA1
(derived fromFile_1.txt
).if [ "${A2[$i]+x}" = "" ]; then ... fi
: Checks if a name fromA1
exists as a key inA2
. The construct${A2[$i]+x}
is a bash parameter expansion that checks if the key$i
exists inA2
. If it doesn’t exist,${A2[$i]+x}
evaluates to an empty string.- If the name from
A1
is not inA2
, it means the name was removed in the second file. The script outputs “$i was removed” and adds “$i has changed” toOutput_File
.
- If the name from
- Second
for
loop (Iterating through keys ofA2
):for i in "${!A2[@]}"; do ... done
: Iterates through all the names (keys) in the associative arrayA2
(derived fromFile_2.txt
).if [ "${A1[$i]+x}" = "" ]; then ... elif [ "${A1[$i]}" != "${A2[$i]}" ]; then ... else ... fi
: Checks different conditions for each name fromA2
.if [ "${A1[$i]+x}" = "" ]; then ... fi
: If the name fromA2
is not inA1
, it means the name was added in the second file. The script outputs “$i was added as ‘${A2[$i]}'” and adds “$i has changed” toOutput_File
.elif [ "${A1[$i]}" != "${A2[$i]}" ]; then ... fi
: If the name exists in bothA1
andA2
, this condition checks if the values associated with that name are different ("${A1[$i]}" != "${A2[$i]}"
). If the values are different, it means the value has changed. The script outputs “$i changed from ‘${A1[$i]}’ to ‘${A2[$i]}'” and adds “$i has changed” toOutput_File
.else ... fi
: If the name exists in bothA1
andA2
and their values are the same, it means the name/value pair has not changed. The script appends the name to theUSERSWHODIDNOTCHANGE
variable to keep track of unchanged entries.
if [ x$USERSWHODIDNOTCHANGE != x ]; then ... fi
: Finally, if there are any names in theUSERSWHODIDNOTCHANGE
variable (meaning there were unchanged entries), the script outputs “no change: $USERSWHODIDNOTCHANGE”.
Alternatives to colcmp.sh
While colcmp.sh
is effective for comparing name/value pairs, Linux offers other powerful command-line tools for file comparison:
diff
: The classic difference utility. It excels at showing line-by-line changes between files and is highly configurable for various output formats (e.g., unified diffs, context diffs). However, it’s not specifically designed for name/value pairs.comm
: Compares two sorted files line by line and outputs lines unique to file 1, lines unique to file 2, and lines common to both. Useful for finding common and unique entries but requires sorted input and isn’t ideal for value comparisons.cmp
: A simpler comparison tool that identifies the first byte and line number where two files differ. Useful for quick binary or text file comparison to determine if they are identical or not.
Choosing the right tool depends on your specific comparison needs. For structured name/value data, colcmp.sh
offers a targeted and efficient solution. For general text file comparisons or patch generation, diff
remains the go-to tool.
Conclusion
The colcmp.sh
script provides a practical and efficient way to Compare Two Files In Linux, specifically when dealing with name/value pairs. By leveraging bash associative arrays and sed
for text processing, it accurately identifies changes, additions, and removals of entries. This script is a valuable addition to any Linux user’s toolkit for managing configuration files, tracking data modifications, and automating comparison tasks in scripts and workflows. Understanding its inner workings not only empowers you to use it effectively but also provides insights into bash scripting techniques for file manipulation and data comparison.