Comparing strings is a fundamental operation in programming, and C# offers a rich set of tools to perform these comparisons effectively. Whether you’re checking for string equality or determining sort order, understanding the nuances of string comparison in C# is crucial for writing robust and reliable applications. This guide delves into the intricacies of comparing strings in C#, providing you with the knowledge to choose the right approach for any situation.
When you compare strings in C#, you’re essentially addressing two primary questions:
- Equality Check: Are these two strings the same?
- Sorting Order: How should these strings be ordered when sorting?
However, these seemingly simple questions become complex due to several factors that influence string comparisons:
- Ordinal vs. Linguistic Comparison: Should the comparison be based on the binary values of characters (ordinal) or language-specific rules (linguistic)?
- Case Sensitivity: Should the comparison differentiate between uppercase and lowercase letters?
- Culture-Specific Rules: Should cultural conventions affect the comparison, especially for linguistic comparisons?
- Platform Dependency: How might different operating systems or platforms affect linguistic comparisons?
To manage these complexities, C# provides the System.StringComparison
enumeration. This enumeration offers a range of options to specify the type of string comparison you want to perform:
CurrentCulture
: Uses culture-sensitive sort rules based on the current culture settings of the system.CurrentCultureIgnoreCase
: Similar toCurrentCulture
, but ignores case differences during comparison.InvariantCulture
: Employs culture-sensitive sort rules based on the invariant culture, which is culture-agnostic.InvariantCultureIgnoreCase
: CombinesInvariantCulture
with case-insensitivity.Ordinal
: Performs a fast, ordinal (binary) comparison, focusing on the numeric value of each character.OrdinalIgnoreCase
: Conducts an ordinal comparison while ignoring case.
Choosing the correct StringComparison
type is essential for achieving the desired comparison behavior and ensuring your application works predictably across different environments.
Understanding Default Ordinal Comparisons in C
In C#, many common string operations, such as string.Equals(string)
and the ==
operator, default to ordinal comparisons. Let’s examine what this means in practice:
string root = @"C:users";
string root2 = @"C:Users";
bool result = root.Equals(root2);
Console.WriteLine($"Ordinal comparison (default Equals): <{root}> and <{root2}> are {(result ? "equal." : "not equal.")}");
result = root.Equals(root2, StringComparison.Ordinal);
Console.WriteLine($"Ordinal comparison (explicit Ordinal): <{root}> and <{root2}> are {(result ? "equal." : "not equal.")}");
Console.WriteLine($"Using == operator: <{root}> and <{root2}> are {(root == root2 ? "equal" : "not equal")}");
This code snippet demonstrates that default ordinal comparisons are case-sensitive. They directly compare the binary representation of each char
in the strings. Therefore, "C:users"
and "C:Users"
are considered different because the case of the ‘U’ differs from ‘u’.
It’s important to note a subtle distinction: while equality checks (Equals
, ==
, !=
) perform ordinal comparisons by default, methods like String.CompareTo
and String.Compare(string, string)
use culture-aware linguistic comparisons based on the current culture by default. To avoid ambiguity and ensure clarity in your code, it’s best practice to explicitly specify the StringComparison
type in your string comparison operations.
Case-Insensitive Ordinal String Comparisons
For scenarios where case should be ignored during string comparison in C#, StringComparison.OrdinalIgnoreCase
is the ideal choice. C# provides methods like String.Equals(string, StringComparison)
and String.Compare(string, string, StringComparison)
to facilitate case-insensitive ordinal comparisons:
string root = @"C:users";
string root2 = @"C:Users";
bool result = root.Equals(root2, StringComparison.OrdinalIgnoreCase);
bool areEqual = String.Equals(root, root2, StringComparison.OrdinalIgnoreCase);
int comparison = String.Compare(root, root2, comparisonType: StringComparison.OrdinalIgnoreCase);
Console.WriteLine($"Ordinal ignore case (Equals): <{root}> and <{root2}> are {(result ? "equal." : "not equal.")}");
Console.WriteLine($"Ordinal ignore case (static Equals): <{root}> and <{root2}> are {(areEqual ? "equal." : "not equal.")}");
if (comparison < 0)
Console.WriteLine($"<{root}> is less than <{root2}>");
else if (comparison > 0)
Console.WriteLine($"<{root}> is greater than <{root2}>");
else
Console.WriteLine($"<{root}> and <{root2}> are equivalent in order");
These methods utilize the casing rules of the invariant culture to perform the case-insensitive comparison. The invariant culture provides a consistent, culture-neutral basis for case conversion, ensuring predictable behavior regardless of the user’s locale.
Linguistic String Comparisons: Considering Language and Culture
Linguistic comparisons, in contrast to ordinal comparisons, take into account language-specific sorting rules and cultural conventions. Many string comparison methods in C#, such as String.StartsWith
, default to linguistic comparisons based on the current culture. This approach, often referred to as “word sort order,” can lead to more human-friendly sorting and comparison results.
In linguistic comparisons, certain Unicode characters might be assigned special weights. For example, a hyphen (“-“) might have a small weight, causing “co-op” and “coop” to be placed close together in a sorted list. Some non-printing control characters might be ignored entirely. Additionally, some Unicode characters can be equivalent to sequences of char
instances.
Consider the German word “Straße.” In German, “ß” (Eszett) is linguistically equivalent to “ss.” The following example illustrates this:
string first = "Sie tanzen auf der Straße.";
string second = "Sie tanzen auf der Strasse.";
Console.WriteLine($"First sentence is <{first}>");
Console.WriteLine($"Second sentence is <{second}>");
bool equal = String.Equals(first, second, StringComparison.InvariantCulture);
Console.WriteLine($"The two strings {(equal ? "are" : "are not")} equal (InvariantCulture).");
showComparison(first, second);
string word = "coop";
string words = "co-op";
string other = "cop";
showComparison(word, words);
showComparison(word, other);
showComparison(words, other);
void showComparison(string one, string two)
{
int compareLinguistic = String.Compare(one, two, StringComparison.InvariantCulture);
int compareOrdinal = String.Compare(one, two, StringComparison.Ordinal);
if (compareLinguistic < 0)
Console.WriteLine($"<{one}> is less than <{two}> using invariant culture");
else if (compareLinguistic > 0)
Console.WriteLine($"<{one}> is greater than <{two}> using invariant culture");
else
Console.WriteLine($"<{one}> and <{two}> are equivalent in order using invariant culture");
if (compareOrdinal < 0)
Console.WriteLine($"<{one}> is less than <{two}> using ordinal comparison");
else if (compareOrdinal > 0)
Console.WriteLine($"<{one}> is greater than <{two}> using ordinal comparison");
else
Console.WriteLine($"<{one}> and <{two}> are equivalent in order using ordinal comparison");
}
As demonstrated, linguistic comparisons using InvariantCulture
treat “Straße” and “Strasse” as equivalent, while ordinal comparisons consider them different. The behavior of linguistic comparisons can also vary across platforms and .NET versions due to underlying globalization libraries.
Culture-Specific String Comparisons in C
For applications that need to handle text in specific languages or regions, C# allows you to perform string comparisons using specific cultures. The CultureInfo
class in System.Globalization
represents cultural information. By providing a CultureInfo
object to string comparison methods, you can tailor the comparison rules to a particular culture.
string first = "Sie tanzen auf der Straße.";
string second = "Sie tanzen auf der Strasse.";
Console.WriteLine($"First sentence is <{first}>");
Console.WriteLine($"Second sentence is <{second}>");
var en = new System.Globalization.CultureInfo("en-US");
int i = String.Compare(first, second, en, System.Globalization.CompareOptions.None);
Console.WriteLine($"Comparing in {en.Name} returns {i}.");
var de = new System.Globalization.CultureInfo("de-DE");
i = String.Compare(first, second, de, System.Globalization.CompareOptions.None);
Console.WriteLine($"Comparing in {de.Name} returns {i}.");
bool b = String.Equals(first, second, StringComparison.CurrentCulture);
Console.WriteLine($"The two strings {(b ? "are" : "are not")} equal (CurrentCulture).");
string word = "coop";
string words = "co-op";
string other = "cop";
showComparison(word, words, en);
showComparison(word, other, en);
showComparison(words, other, en);
void showComparison(string one, string two, System.Globalization.CultureInfo culture)
{
int compareLinguistic = String.Compare(one, two, en, System.Globalization.CompareOptions.None);
int compareOrdinal = String.Compare(one, two, StringComparison.Ordinal);
if (compareLinguistic < 0)
Console.WriteLine($"<{one}> is less than <{two}> using en-US culture");
else if (compareLinguistic > 0)
Console.WriteLine($"<{one}> is greater than <{two}> using en-US culture");
else
Console.WriteLine($"<{one}> and <{two}> are equivalent in order using en-US culture");
if (compareOrdinal < 0)
Console.WriteLine($"<{one}> is less than <{two}> using ordinal comparison");
else if (compareOrdinal > 0)
Console.WriteLine($"<{one}> is greater than <{two}> using ordinal comparison");
else
Console.WriteLine($"<{one}> and <{two}> are equivalent in order using ordinal comparison");
}
Culture-sensitive comparisons are particularly relevant when dealing with user-generated content, as users from different locales may have varying expectations for sorting and comparison.
Linguistic Sorting and Searching of String Arrays
C# provides convenient methods for linguistically sorting and searching arrays of strings, taking into account the current culture. The Array
class offers static methods that accept a StringComparer
object, enabling culture-aware operations.
Here’s an example of sorting a string array using the current culture’s linguistic rules:
string[] lines = new string[] { @"c:publictextfile.txt", @"c:publictextFile.TXT", @"c:publicText.txt", @"c:publictestfile2.txt" };
Console.WriteLine("Non-sorted order:");
foreach (string s in lines)
{
Console.WriteLine($" {s}");
}
Console.WriteLine("nrSorted order (CurrentCulture):");
Array.Sort(lines, StringComparer.CurrentCulture);
foreach (string s in lines)
{
Console.WriteLine($" {s}");
}
Once sorted linguistically, you can efficiently search the array using binary search with Array.BinarySearch
. Remember to use the same StringComparer
for both sorting and searching to ensure consistent results.
string[] lines = new string[] { @"c:publictextfile.txt", @"c:publictextFile.TXT", @"c:publicText.txt", @"c:publictestfile2.txt" };
Array.Sort(lines, StringComparer.CurrentCulture);
string searchString = @"c:publicTEXTFILE.TXT";
Console.WriteLine($"nBinary search for <{searchString}> (CurrentCulture)");
int result = Array.BinarySearch(lines, searchString, StringComparer.CurrentCulture);
ShowWhere<string>(lines, result);
Console.WriteLine($"{(result > 0 ? "Found" : "Did not find")} {searchString}");
void ShowWhere<t>(T[] array, int index)
{
if (index < 0)
{
index = ~index;
Console.Write("Not found. Sorts between: ");
if (index == 0)
Console.Write("beginning of sequence and ");
else
Console.Write($"{array[index - 1]} and ");
if (index == array.Length)
Console.WriteLine("end of sequence.");
else
Console.WriteLine($"{array[index]}.");
}
else
{
Console.WriteLine($"Found at index {index}.");
}
}
Ordinal Sorting and Searching in String Collections
For collections like List<string>
, C# allows ordinal sorting and searching. The List<string>.Sort
method can use a delegate to define the comparison logic. String.CompareTo
provides a default ordinal case-sensitive comparison function.
List<string> lines = new List<string> { @"c:publictextfile.txt", @"c:publictextFile.TXT", @"c:publicText.txt", @"c:publictestfile2.txt" };
Console.WriteLine("Non-sorted order:");
foreach (string s in lines)
{
Console.WriteLine($" {s}");
}
Console.WriteLine("nrSorted order (Ordinal):");
lines.Sort((left, right) => left.CompareTo(right));
foreach (string s in lines)
{
Console.WriteLine($" {s}");
}
Similarly, List<string>.BinarySearch
can be used for efficient searching in ordinally sorted lists.
List<string> lines = new List<string> { @"c:publictextfile.txt", @"c:publictextFile.TXT", @"c:publicText.txt", @"c:publictestfile2.txt" };
lines.Sort((left, right) => left.CompareTo(right));
string searchString = @"c:publicTEXTFILE.TXT";
Console.WriteLine($"nBinary search for <{searchString}> (Ordinal)");
int result = lines.BinarySearch(searchString);
ShowWhere<string>(lines, result);
Console.WriteLine($"{(result > 0 ? "Found" : "Did not find")} {searchString}");
void ShowWhere<t>(IList<t> collection, int index)
{
if (index < 0)
{
index = ~index;
Console.Write("Not found. Sorts between: ");
if (index == 0)
Console.Write("beginning of sequence and ");
else
Console.Write($"{collection[index - 1]} and ");
if (index == collection.Count)
Console.WriteLine("end of sequence.");
else
Console.WriteLine($"{collection[index]}.");
}
else
{
Console.WriteLine($"Found at index {index}.");
}
}
Consistency is key: always use the same comparison type for both sorting and searching to avoid unexpected outcomes.
For collections like Hashtable
, Dictionary
, and List
, constructors are available that accept a StringComparer
parameter. Whenever possible, utilize these constructors and explicitly specify either StringComparer.Ordinal
or StringComparer.OrdinalIgnoreCase
for optimal performance and predictable behavior, especially when string keys are involved.
Conclusion: Choosing the Right String Comparison in C
Mastering string comparison in C# involves understanding the different types of comparisons available and selecting the most appropriate one for your specific needs. Whether you opt for ordinal or linguistic comparisons, case-sensitive or case-insensitive approaches, and culture-specific or invariant culture rules, C# provides the tools to handle string comparisons effectively. By explicitly specifying the StringComparison
type, you ensure clarity, predictability, and robustness in your C# applications when working with strings.