compare-similar-strings-using-strcmp
compare-similar-strings-using-strcmp

String Compare in C: A Detailed Guide to strcmp()

String comparison is a fundamental operation in programming, especially in C, where strings are essentially arrays of characters. C provides a built-in function, strcmp(), within its standard library to perform lexicographical comparison between two strings. This article offers a comprehensive guide to strcmp() in C, detailing its syntax, working mechanism, and practical examples to enhance your understanding and code efficiency.

Understanding strcmp() in C

The strcmp() function is a cornerstone of string manipulation in C. It’s defined in the <string.h> header file and is used to compare two strings, character by character, to determine their lexicographical order. Lexicographical order is essentially dictionary order. strcmp() is crucial for tasks like sorting strings, searching within text, and validating user input.

Syntax of strcmp()

The syntax for the strcmp() function in C is straightforward:

#include <string.h>

int strcmp(const char *str1, const char *str2);

Parameters:

  • str1: A pointer to the first string to be compared. It’s treated as a C-style string, which is a null-terminated array of characters.
  • str2: A pointer to the second string to be compared, also a null-terminated array of characters.

Return Value:

strcmp() returns an integer value after comparing str1 and str2. This return value indicates the relationship between the two strings:

  • 0 (Zero): Indicates that str1 and str2 are identical. This means they have the same sequence of characters.
  • Greater than 0 (Positive): Indicates that str1 is lexicographically greater than str2. This occurs when the first differing character in str1 has a greater ASCII value than the corresponding character in str2, or when str1 is a longer string and matches str2 up to the length of str2.
  • Less than 0 (Negative): Indicates that str1 is lexicographically less than str2. This happens when the first differing character in str1 has a smaller ASCII value than the corresponding character in str2, or when str1 is a shorter string and matches str2 up to the length of str1.

How strcmp() Function Works

The strcmp() function operates by comparing the two input strings lexicographically. Let’s break down the step-by-step process:

  1. Character-by-Character Comparison: strcmp() starts by comparing the first character of str1 with the first character of str2. It then proceeds to compare the second characters, and so on.

  2. ASCII Value Comparison: The comparison is based on the ASCII values of the characters. For instance, ‘A’ (ASCII 65) is considered less than ‘a’ (ASCII 97), and ‘9’ (ASCII 57) is greater than ‘0’ (ASCII 48).

  3. Iteration until Difference or Null Terminator: strcmp() continues the comparison until one of the following conditions is met:

    • Characters Differ: If a pair of characters at the same position in str1 and str2 are different, strcmp() stops. It then determines which string is lexicographically greater based on the ASCII values of these differing characters.
    • Null Terminator Encountered: If strcmp() reaches the null terminator () in both strings simultaneously and all preceding characters have been equal, it means the strings are identical, and it returns 0.
    • Null Terminator in One String First: If a null terminator is encountered in one string before a difference is found, the shorter string is considered lexicographically smaller (unless they are identical up to that point).
  4. Determining the Return Value: Based on the comparison, strcmp() returns:

    • 0 if strings are identical.
    • A positive value if str1 is lexicographically greater. The exact positive value is implementation-dependent and should not be relied upon, simply check if it’s > 0.

    • A negative value if str1 is lexicographically smaller. Similarly, the exact negative value is implementation-dependent, just check if it’s < 0.

Practical Examples of strcmp() in C

Let’s explore various examples to illustrate how strcmp() works in practice.

Example 1: Comparing Identical Strings

This example demonstrates the case where strcmp() returns 0, indicating that the two strings are the same.

#include <stdio.h>
#include <string.h>

int main() {
    char str1[] = "Hello";
    char str2[] = "Hello";
    int result = strcmp(str1, str2);

    if (result == 0) {
        printf("Strings are identicaln");
    } else {
        printf("Strings are not identicaln");
    }
    return 0;
}

Output:

Strings are identical

Explanation: str1 and str2 both contain the string “Hello”. strcmp() compares them character by character and finds no differences. Therefore, it returns 0, and the program prints “Strings are identical”.

Example 2: Finding Lexicographically Greater String

Here, we compare two strings where the first string is lexicographically greater than the second.

#include <stdio.h>
#include <string.h>

int main() {
    char string1[] = "zebra";
    char string2[] = "apple";
    int result = strcmp(string1, string2);

    if (result > 0) {
        printf(""%s" is lexicographically greater than "%s"n", string1, string2);
    } else if (result < 0) {
        printf(""%s" is lexicographically less than "%s"n", string1, string2);
    } else {
        printf("Strings are equaln");
    }
    return 0;
}

Output:

"zebra" is lexicographically greater than "apple"

Explanation: When strcmp() compares “zebra” and “apple”, it starts comparing from the first character. ‘z’ has a higher ASCII value than ‘a’. Therefore, strcmp() determines that “zebra” is lexicographically greater and returns a positive value.

Example 3: Finding Lexicographically Smaller String

In this example, the first string is lexicographically smaller than the second.

#include <stdio.h>
#include <string.h>

int main() {
    char s1[] = "Bug";
    char s2[] = "Cat";
    int res = strcmp(s1, s2);

    if (res > 0) {
        printf(""%s" is lexicographically greater than "%s"n", s1, s2);
    } else if (res < 0) {
        printf(""%s" is lexicographically less than "%s"n", s1, s2);
    } else {
        printf("Strings are equaln");
    }
    return 0;
}

Output:

"Bug" is lexicographically less than "Cat"

Explanation: strcmp() compares “Bug” and “Cat”. ‘B’ has a lower ASCII value than ‘C’. Hence, strcmp() concludes that “Bug” is lexicographically smaller and returns a negative value.

Example 4: Using strcmp() to Sort an Array of Strings

strcmp() is often used as a comparison function in sorting algorithms, particularly when sorting arrays of strings. Here’s how you can use it with qsort() to sort an array of strings:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int compareStrings(const void *a, const void *b) {
    return strcmp(*(const char **)a, *(const char **)b);
}

int main() {
    const char *names[] = {"Charlie", "Alice", "Bob", "David"};
    int n = sizeof(names) / sizeof(names[0]);

    qsort((void *)names, n, sizeof(names[0]), compareStrings);

    printf("Sorted array of strings:n");
    for (int i = 0; i < n; i++) {
        printf("%sn", names[i]);
    }
    return 0;
}

Output:

Sorted array of strings:
Alice
Bob
Charlie
David

Explanation:

  • We define a comparison function compareStrings that uses strcmp() to compare two strings. This function is compatible with the qsort() function, which requires a comparator in this specific format.
  • qsort() then uses compareStrings to sort the names array lexicographically.

Key Considerations for strcmp()

  • Case Sensitivity: strcmp() is case-sensitive. “Apple” and “apple” are considered different. If you need case-insensitive comparison, consider using strcasecmp() (non-standard, but available in many systems) or converting strings to the same case before comparison.
  • Locale-Specific Comparisons: strcmp() performs comparisons based on the ASCII values, which may not align with locale-specific sorting rules. For locale-aware string comparisons, functions like strcoll() should be used.
  • Binary vs. Text Comparison: strcmp() is designed for comparing text strings. For comparing raw memory blocks, including strings but potentially without null termination or with specific encoding considerations, memcmp() might be more appropriate.

Conclusion

The strcmp() function is an essential tool for string manipulation in C. Understanding its functionality, return values, and how it performs lexicographical comparisons is crucial for any C programmer. From basic string equality checks to more complex tasks like sorting, strcmp() provides a fundamental building block for text processing in C applications. By mastering strcmp(), you can write more efficient and robust C programs that effectively handle string data.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *