In the Linux environment, comparing files is a common and essential task. Whether you are a developer tracking changes in code, a system administrator managing configurations, or simply a user organizing data, knowing how to effectively compare files is crucial. Linux offers a rich set of command-line tools for this purpose, ranging from simple difference checkers to more sophisticated scripts that analyze specific file formats. This article provides a comprehensive guide to Linux Comparing Two Files, exploring various tools and techniques to suit different needs.
Basic File Comparison with diff
The most fundamental tool for linux comparing two files is the diff
command. diff
(short for difference) is a command-line utility that compares files line by line and outputs the differences between them. It is incredibly versatile and forms the basis for many other file comparison operations.
Basic Usage of diff
:
To compare two text files, file1.txt
and file2.txt
, simply use:
diff file1.txt file2.txt
The output of diff
is presented in a format that indicates the lines that differ between the two files. Let’s consider an example:
file1.txt:
User1 US
User2 CA
User3 UK
User4 AU
file2.txt:
User1 US
User2 CA
User3 NG
User5 JP
Running diff file1.txt file2.txt
would produce output similar to:
3c3
< User3 UK
---
> User3 NG
4d3
< User4 AU
5a5
> User5 JP
Understanding diff
Output:
3c3
: This indicates a change on line 3 of both files. ‘c’ means “change”.< User3 UK
: This line is fromfile1.txt
and is being removed or changed. The<
symbol indicates content from the first file.---
: Separator between the differing sections.> User3 NG
: This line is fromfile2.txt
and is being added or is the replacement. The>
symbol indicates content from the second file.4d3
: This indicates a deletion on line 4 of the first file relative to line 3 of the second file. ‘d’ means “delete”.< User4 AU
: Line deleted fromfile1.txt
.5a5
: This indicates an addition on line 5 of the first file relative to line 5 of the second file. ‘a’ means “add”.> User5 JP
: Line added fromfile2.txt
.
Useful diff
Options:
-s
or--report-identical-files
: Report when two files are identical.-y
or--side-by-side
: Display differences in a side-by-side format, improving readability for some users.-u
or-U NUM
or--unified[=NUM]
: Output in unified diff format. This format is commonly used for patches and is easier to read than the default format.NUM
specifies the number of context lines to show (default is 3).-q
or--brief
: Report only whether files differ, not the details of the differences. This is useful for quick checks.-i
or--ignore-case
: Ignore case differences in lines.-b
or--ignore-space-change
: Ignore changes in the amount of whitespace.-w
or--ignore-all-space
: Ignore all whitespace when comparing lines.
Comparing File Contents and Reporting Changes with colcmp.sh
The provided colcmp.sh
script offers a more specialized approach to linux comparing two files. It is designed to compare files containing name/value pairs, typically in the format name value
on each line. This script is particularly useful when you need to track changes to specific entries in configuration files or data lists.
Understanding colcmp.sh
Script:
The script works by:
-
Initial Comparison: It first uses
cmp -s
to quickly check if the two input files are identical. If they are, it reports “files are identical” and exits.cmp -s "$1" "$2" case "$?" in 0) echo "" > Output_File echo "files are identical" ;;
-
Processing Files into Associative Arrays: If the files differ, the script proceeds to process each file into a bash associative array. This is done through a series of
sed
commands:- Copying to Temporary Files: The input files are copied to temporary files in the user’s home directory (
~/.colcmp.array1.tmp.sh
and~/.colcmp.array2.tmp.sh
). - Escaping Special Characters:
sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g"
escapes special characters in the file content to prevent unintended execution when sourced as a script. - Commenting Out Lines:
sed -i -E "s/^(.*)$/#\1/"
comments out every line in the file by adding a#
at the beginning. This is a safety measure to prevent accidental execution of file content as code. - Converting to Array Assignments:
sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/"
is the core transformation. It converts lines of the format#name value
into bash associative array assignment statements likeA1[name]="value"
. - Making Executable (Potentially Unnecessary):
chmod 755 ~/.colcmp.array1.tmp.sh
makes the temporary files executable. While technically not needed forsource
, it might be a habit from general script handling. - Declaring and Sourcing Arrays:
declare -A A1
declaresA1
as an associative array, andsource ~/.colcmp.array1.tmp.sh
executes the temporary file in the current shell, populating theA1
array with the name/value pairs from the first input file. The same process is repeated for the second file and arrayA2
.
1) echo "" > Output_File cp "$1" ~/.colcmp.array1.tmp.sh sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh chmod 755 ~/.colcmp.array1.tmp.sh declare -A A1 source ~/.colcmp.array1.tmp.sh cp "$2" ~/.colcmp.array2.tmp.sh sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh chmod 755 ~/.colcmp.array2.tmp.sh declare -A A2 source ~/.colcmp.array2.tmp.sh ...
- Copying to Temporary Files: The input files are copied to temporary files in the user’s home directory (
-
Comparing Arrays and Reporting Changes: The script then iterates through the keys of each array to identify changes:
- Detecting Removed Entries: It loops through the keys of
A1
(first file). If a key is not found inA2
(second file), it means the entry was removed in the second file. - Detecting Added or Changed Entries: It loops through the keys of
A2
. If a key is not inA1
, it’s a new entry. If the key exists in both but the values are different, it reports a change. If keys and values are identical, it adds the name to a list of unchanged users.
USERSWHODIDNOTCHANGE= for i in "${!A1[@]}"; do if [ "${A2[$i]+x}" = "" ]; then echo "$i was removed" echo "$i has changed" > Output_File fi done for i in "${!A2[@]}"; do if [ "${A1[$i]+x}" = "" ]; then echo "$i was added as '${A2[$i]}'" echo "$i has changed" > Output_File elif [ "${A1[$i]}" != "${A2[$i]}" ]; then echo "$i changed from '${A1[$i]}' to '${A2[$i]}'" echo "$i has changed" > Output_File else if [ x$USERSWHODIDNOTCHANGE != x ]; then USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE" fi USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE" fi done if [ x$USERSWHODIDNOTCHANGE != x ]; then echo "no change: $USERSWHODIDNOTCHANGE" fi ;;
- Detecting Removed Entries: It loops through the keys of
-
Error Handling: The script includes basic error handling for cases where the input files are not found or access is denied.
*) echo "error: file not found, access denied, etc..." echo "usage: ./colcmp.sh File_1.txt File_2.txt" ;; esac
Usage Example of colcmp.sh
:
Using the same file1.txt
and file2.txt
examples:
./colcmp.sh file1.txt file2.txt
Output:
User3 changed from 'UK' to 'NG'
User4 was removed
User5 was added as 'JP'
no change: User1,User2
Output_File Content (Output_File):
User3 has changed
User4 has changed
User5 has changed
Alternatives to colcmp.sh
and diff
While diff
and colcmp.sh
are useful, Linux provides other tools that can be more appropriate depending on the specific file comparison task:
-
comm
: Thecomm
command is excellent for comparing sorted files line by line. It can output lines unique to the first file, lines unique to the second file, and lines common to both.comm file1.txt file2.txt
comm
assumes the input files are sorted. If not, you should sort them first usingsort
. -
vimdiff
orgvimdiff
: These are graphical diff tools that use the Vim text editor to display differences side-by-side with syntax highlighting. They are very user-friendly for visually inspecting differences, especially in code files.vimdiff file1.txt file2.txt
-
meld
:meld
is another powerful graphical diff and merge tool. It provides a three-way comparison and is excellent for merging changes between files. It’s particularly useful for resolving merge conflicts in version control systems.meld file1.txt file2.txt
-
awk
orperl
for custom comparisons: For more complex comparisons or when you need to compare files based on specific fields or criteria, scripting languages likeawk
orperl
offer great flexibility. You can write scripts to parse files, extract relevant data, and perform custom comparison logic tailored to your needs.colcmp.sh
itself is an example of a custom comparison script using bash andsed
.
Choosing the Right Tool
The best tool for linux comparing two files depends on your specific requirements:
- Simple Line-by-Line Text Differences:
diff
is the standard and most versatile choice. - Quickly Check if Files are Identical:
cmp -s
ordiff -q
. - Comparing Name/Value Pairs and Reporting Changes:
colcmp.sh
is specifically designed for this task. - Comparing Sorted Files and Finding Common/Unique Lines:
comm
. - Visual, Side-by-Side Comparison:
vimdiff
ormeld
. - Complex or Field-Based Comparisons:
awk
,perl
, or custom bash scripts.
By understanding the strengths of each of these tools, you can efficiently and effectively compare files in Linux for any task at hand. Whether you are debugging code, managing configurations, or analyzing data, Linux provides a robust toolkit for all your file comparison needs.