case insensitive string comparison - relative speed
-
Suppose two strings,
String S1, S2;
Which is faster:
if( S1.ToLower() == S2.ToLower() )
Or:
if( String.Compare(S1, S2, true) == 0 )
Under what conditions might the relative speed vary?
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
Suppose two strings,
String S1, S2;
Which is faster:
if( S1.ToLower() == S2.ToLower() )
Or:
if( String.Compare(S1, S2, true) == 0 )
Under what conditions might the relative speed vary?
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
Blake Miller wrote:
Under what conditions might the relative speed vary?
- strings are reference-types, and atomic in memory
- Converting them both before the comparison is slower than comparing them directly (there can be only one!)
- Microsoft "advises" to use uppercase constants. Has something to do with efficiency in comparing.
- Does it matter?
The last point is the most important one; readability is important, as it influences maintainability. If you're doing a lot of string-operations, consider a RegEx for the job. -edit;
Many string operations, most important the Compare and Equals methods, now provide an overload that accepts a StringComparision enumeration value as a parameter. When you specify either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, the string comparison will be non-linguistic. That is, the features that are specific to the natural language are ignored when making comparison decisions. This means the decisions are based on simple byte comparisons and ignore casing or equivalence tables that are parameterized by culture. As a result, by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.
It's one of FxCops' warnings[^] :)
Bastard Programmer from Hell :suss: if you can't read my code, try converting it here[^]
-
Suppose two strings,
String S1, S2;
Which is faster:
if( S1.ToLower() == S2.ToLower() )
Or:
if( String.Compare(S1, S2, true) == 0 )
Under what conditions might the relative speed vary?
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
The first one creates two new strings, converting each character to lower-case, and then compares the results. The second one performs a case-insensitive comparison of each character, without allocating any new strings. Instinct would say that the second will always out-perform the first. Here's some code to test that:
int ITERATIONS = 1000000;
string s1 = "Hello World";
string s2 = "hello world";Debug.Assert(s1.ToLower() == s2.ToLower());
Debug.Assert(string.Compare(s1, s2, true) == 0);var sw1 = new Stopwatch();
sw1.Start();
for (int i = 0; i < ITERATIONS; i++)
{
Debug.Assert(s1.ToLower() == s2.ToLower());
}
sw1.Stop();var sw2 = new Stopwatch();
sw2.Start();
for (int i = 0; i < ITERATIONS; i++)
{
Debug.Assert(string.Compare(s1, s2, true) == 0);
}
sw2.Stop();Console.WriteLine("ToLower: {0}", sw1.Elapsed);
Console.WriteLine("Compare: {0}", sw2.Elapsed);On my computer, the output is:
ToLower: 00:00:00.4507542
Compare: 00:00:00.1856049The
ToLower
approach takes more than twice as long as theCompare
approach.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
Suppose two strings,
String S1, S2;
Which is faster:
if( S1.ToLower() == S2.ToLower() )
Or:
if( String.Compare(S1, S2, true) == 0 )
Under what conditions might the relative speed vary?
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
Blake Miller wrote:
Under what conditions might the relative speed vary?
- strings are reference-types, and atomic in memory
- Converting them both before the comparison is slower than comparing them directly (there can be only one!)
- Microsoft "advises" to use uppercase constants. Has something to do with efficiency in comparing.
- Does it matter?
The last point is the most important one; readability is important, as it influences maintainability. If you're doing a lot of string-operations, consider a RegEx for the job. -edit;
Many string operations, most important the Compare and Equals methods, now provide an overload that accepts a StringComparision enumeration value as a parameter. When you specify either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, the string comparison will be non-linguistic. That is, the features that are specific to the natural language are ignored when making comparison decisions. This means the decisions are based on simple byte comparisons and ignore casing or equivalence tables that are parameterized by culture. As a result, by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.
It's one of FxCops' warnings[^] :)
Bastard Programmer from Hell :suss: if you can't read my code, try converting it here[^]
Thank you for your answers. Would you have a link to a tech note, language guide or MSDN about this part "Microsoft 'advises' to use uppercase constants." It does matter. I am looking into this because customers are complaining about CPU load.
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
Awesome! Thanks for the link. :thumbsup:
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
The first one creates two new strings, converting each character to lower-case, and then compares the results. The second one performs a case-insensitive comparison of each character, without allocating any new strings. Instinct would say that the second will always out-perform the first. Here's some code to test that:
int ITERATIONS = 1000000;
string s1 = "Hello World";
string s2 = "hello world";Debug.Assert(s1.ToLower() == s2.ToLower());
Debug.Assert(string.Compare(s1, s2, true) == 0);var sw1 = new Stopwatch();
sw1.Start();
for (int i = 0; i < ITERATIONS; i++)
{
Debug.Assert(s1.ToLower() == s2.ToLower());
}
sw1.Stop();var sw2 = new Stopwatch();
sw2.Start();
for (int i = 0; i < ITERATIONS; i++)
{
Debug.Assert(string.Compare(s1, s2, true) == 0);
}
sw2.Stop();Console.WriteLine("ToLower: {0}", sw1.Elapsed);
Console.WriteLine("Compare: {0}", sw2.Elapsed);On my computer, the output is:
ToLower: 00:00:00.4507542
Compare: 00:00:00.1856049The
ToLower
approach takes more than twice as long as theCompare
approach.
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
Confirms my 'gut feeling' and we all know how much we like to depend upon those :)
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
Thank you for your answers. Would you have a link to a tech note, language guide or MSDN about this part "Microsoft 'advises' to use uppercase constants." It does matter. I am looking into this because customers are complaining about CPU load.
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
-
Thank you for your answers. Would you have a link to a tech note, language guide or MSDN about this part "Microsoft 'advises' to use uppercase constants." It does matter. I am looking into this because customers are complaining about CPU load.
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
Blake Miller wrote:
It does matter. I am looking into this because customers are complaining about CPU load.
I think you are barking up the wrong tree if that's the case. As another poster responded, with 1 *MILLION* string comparisons, the difference in performance is non-existant. Your performance issues are likely elsewhere.
-
Suppose two strings,
String S1, S2;
Which is faster:
if( S1.ToLower() == S2.ToLower() )
Or:
if( String.Compare(S1, S2, true) == 0 )
Under what conditions might the relative speed vary?
I need a 32 bit unsigned value just to hold the number of coding WTF I see in a day …
In terms of application performance (rather than just statement performance.) If you have not measured the application using appropriate data then your first step would be to do that. If you have measured and found that this specific method containing this statement is the problem then finding a different algorithmic approach would have much more impact. The goal of course in that case is not to find a faster way to do the comparison but instead to find a way so no comparison at all is needed.