String compare oddity [modified]
-
I was looking at sorting some strings today and noticed that the default comparer orders them wrong -- it sorts "a" before "A", rather than after it, for instance. So I looked into it and read up a bit and tried the InvariantCulture and the Ordinal comparer -- Ordinal works correctly, and I'll likely use it, but I'm not happy about it. According to http://msdn.microsoft.com/en-us/library/dd465121.aspx[^]: "Comparisons that use StringComparison.InvariantCulture and StringComparison.Ordinal work identically on ASCII strings." Which is not strictly true -- they order these differently. Does anyone here know how to make a Culture (based on en-US) that does a case-sensitive sort the right way? Edit: This is interesting... I was experimenting with the System.StringComparer.CurrentCultureIgnoreCase and discovered that it seems to do what I want -- at least in the tests I've made. Which is good. :thumbsup: But that means that it doesn't actually ignore the case! :sigh: Scratch that -- it's just an artifact of how I was testing it.
modified on Friday, August 19, 2011 2:13 PM
-
I was looking at sorting some strings today and noticed that the default comparer orders them wrong -- it sorts "a" before "A", rather than after it, for instance. So I looked into it and read up a bit and tried the InvariantCulture and the Ordinal comparer -- Ordinal works correctly, and I'll likely use it, but I'm not happy about it. According to http://msdn.microsoft.com/en-us/library/dd465121.aspx[^]: "Comparisons that use StringComparison.InvariantCulture and StringComparison.Ordinal work identically on ASCII strings." Which is not strictly true -- they order these differently. Does anyone here know how to make a Culture (based on en-US) that does a case-sensitive sort the right way? Edit: This is interesting... I was experimenting with the System.StringComparer.CurrentCultureIgnoreCase and discovered that it seems to do what I want -- at least in the tests I've made. Which is good. :thumbsup: But that means that it doesn't actually ignore the case! :sigh: Scratch that -- it's just an artifact of how I was testing it.
modified on Friday, August 19, 2011 2:13 PM
One way is to put each characters into an array and the sort the array. Second option is: convert each chars of string to ASCII value. Sort the value and then reconvert it to chars. Finally, merge the chars. You will get the sorted string.
-
One way is to put each characters into an array and the sort the array. Second option is: convert each chars of string to ASCII value. Sort the value and then reconvert it to chars. Finally, merge the chars. You will get the sorted string.
That's equivalent to what the Ordinal comparer does. And I'm not sorting a string, I'm sorting a collection of strings.
-
I was looking at sorting some strings today and noticed that the default comparer orders them wrong -- it sorts "a" before "A", rather than after it, for instance. So I looked into it and read up a bit and tried the InvariantCulture and the Ordinal comparer -- Ordinal works correctly, and I'll likely use it, but I'm not happy about it. According to http://msdn.microsoft.com/en-us/library/dd465121.aspx[^]: "Comparisons that use StringComparison.InvariantCulture and StringComparison.Ordinal work identically on ASCII strings." Which is not strictly true -- they order these differently. Does anyone here know how to make a Culture (based on en-US) that does a case-sensitive sort the right way? Edit: This is interesting... I was experimenting with the System.StringComparer.CurrentCultureIgnoreCase and discovered that it seems to do what I want -- at least in the tests I've made. Which is good. :thumbsup: But that means that it doesn't actually ignore the case! :sigh: Scratch that -- it's just an artifact of how I was testing it.
modified on Friday, August 19, 2011 2:13 PM
-
Ordinal
does an ASCII compare.PIEBALDconsult wrote:
I'll likely use it, but I'm not happy about it
Not happy about what?
"Don't confuse experts with facts" - Eric_V
About having to use the Ordinal comparer to get the correct (desired) case-sensitive sort order. I should be able to use a "linguistic" (I think that's the term the documentation used) Culture that produces the same sort order of ASCII data. And if I can create a Culture that does that and set it as my CurrentCulture, so much the better.
-
About having to use the Ordinal comparer to get the correct (desired) case-sensitive sort order. I should be able to use a "linguistic" (I think that's the term the documentation used) Culture that produces the same sort order of ASCII data. And if I can create a Culture that does that and set it as my CurrentCulture, so much the better.
PIEBALDconsult wrote:
I should be able to use a "linguistic" (I think that's the term the documentation used) Culture that produces the same sort order of ASCII data.
Different languages treat upper and lower case differently. While English considers a lowercase 'a' to be semantically "less than" the uppercase 'A', it may not be the same with other languages. I think that's why the designers of .NET chose to give us the language-neutral
Ordinal
option."Don't confuse experts with facts" - Eric_V
-
PIEBALDconsult wrote:
I should be able to use a "linguistic" (I think that's the term the documentation used) Culture that produces the same sort order of ASCII data.
Different languages treat upper and lower case differently. While English considers a lowercase 'a' to be semantically "less than" the uppercase 'A', it may not be the same with other languages. I think that's why the designers of .NET chose to give us the language-neutral
Ordinal
option."Don't confuse experts with facts" - Eric_V
Shameel wrote:
English considers a lowercase 'a' to be semantically "less than" the uppercase 'A',
Got a reference for that? I think the opposite is true.
-
Shameel wrote:
English considers a lowercase 'a' to be semantically "less than" the uppercase 'A',
Got a reference for that? I think the opposite is true.
PIEBALDconsult wrote:
Got a reference for that?
I don't have a reference, but working with customers, they like to see 'A' on top of lists and 'a' below that.
PIEBALDconsult wrote:
I think the opposite is true.
The opposite is true in case of ASCII.
"Don't confuse experts with facts" - Eric_V
-
PIEBALDconsult wrote:
Got a reference for that?
I don't have a reference, but working with customers, they like to see 'A' on top of lists and 'a' below that.
PIEBALDconsult wrote:
I think the opposite is true.
The opposite is true in case of ASCII.
"Don't confuse experts with facts" - Eric_V
Shameel wrote:
they like to see 'A' on top of lists and 'a' below that.
Exactly. That's how I want it, and the Ordinal comparer does it, but the InvariantCulture (and en-US) does it the other way. Edit: Well not exactly, come to think of it, because the Ordinal comparer also say "Z" < "a", which I don't want.
modified on Friday, August 19, 2011 12:15 PM
-
Shameel wrote:
they like to see 'A' on top of lists and 'a' below that.
Exactly. That's how I want it, and the Ordinal comparer does it, but the InvariantCulture (and en-US) does it the other way. Edit: Well not exactly, come to think of it, because the Ordinal comparer also say "Z" < "a", which I don't want.
modified on Friday, August 19, 2011 12:15 PM
PIEBALDconsult wrote:
Exactly. That's how I want it, and the Ordinal comparer does it
The Ordinal comparer uses the ASCII order of the characters.
Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex
(nul) 0 0000 0x00 | (sp) 32 0040 0x20 | @ 64 0100 0x40 | ` 96 0140 0x60
(soh) 1 0001 0x01 | ! 33 0041 0x21 | A 65 0101 0x41 | a 97 0141 0x61
(stx) 2 0002 0x02 | " 34 0042 0x22 | B 66 0102 0x42 | b 98 0142 0x62
(etx) 3 0003 0x03 | # 35 0043 0x23 | C 67 0103 0x43 | c 99 0143 0x63
(eot) 4 0004 0x04 | $ 36 0044 0x24 | D 68 0104 0x44 | d 100 0144 0x64
(enq) 5 0005 0x05 | % 37 0045 0x25 | E 69 0105 0x45 | e 101 0145 0x65
(ack) 6 0006 0x06 | & 38 0046 0x26 | F 70 0106 0x46 | f 102 0146 0x66
(bel) 7 0007 0x07 | ' 39 0047 0x27 | G 71 0107 0x47 | g 103 0147 0x67
(bs) 8 0010 0x08 | ( 40 0050 0x28 | H 72 0110 0x48 | h 104 0150 0x68
(ht) 9 0011 0x09 | ) 41 0051 0x29 | I 73 0111 0x49 | i 105 0151 0x69
(nl) 10 0012 0x0a | * 42 0052 0x2a | J 74 0112 0x4a | j 106 0152 0x6a
(vt) 11 0013 0x0b | + 43 0053 0x2b | K 75 0113 0x4b | k 107 0153 0x6b
(np) 12 0014 0x0c | , 44 0054 0x2c | L 76 0114 0x4c | l 108 0154 0x6c
(cr) 13 0015 0x0d | - 45 0055 0x2d | M 77 0115 0x4d | m 109 0155 0x6d
(so) 14 0016 0x0e | . 46 0056 0x2e | N 78 0116 0x4e | n 110 0156 0x6e
(si) 15 0017 0x0f | / 47 0057 0x2f | O 79 0117 0x4f | o 111 0157 0x6f
(dle) 16 0020 0x10 | 0 48 0060 0x30 | P 80 0120 0x50 | p 112 0160 0x70
(dc1) 17 0021 0x11 | 1 49 0061 0x31 | Q 81 0121 0x51 | q 113 0161 0x71
(dc2) 18 0022 0x12 | 2 50 0062 0x32 | R 82 0122 0x52 | r 114 0162 0x72
(dc3) 19 0023 0x13 | 3 51 0063 0x33 | S 83 0123 0x53 | s 115 0163 0x73
(dc4) 20 0024 0x14 | 4 52 0064 0x34 | T 84 0124 0x54 | t 116 0164 0x74
(nak) 21 0025 0x15 | 5 53 0065 0x35 | U 85 0125 0x55 | u 117 0165 0x75
(syn) 22 0026 0x16 | 6 54 0066 0x36 | V 86 0126 0x56 | v 118 0166 0x76
(etb) 23 0027 0x17 | 7 55 0067 0x37 | W 87 0127 0x57 | w 119 0167 0x77
(can) 24 0030 0x18 | 8 56 0070 0x38 | X 88 0130 0x58 | x -
PIEBALDconsult wrote:
Exactly. That's how I want it, and the Ordinal comparer does it
The Ordinal comparer uses the ASCII order of the characters.
Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex
(nul) 0 0000 0x00 | (sp) 32 0040 0x20 | @ 64 0100 0x40 | ` 96 0140 0x60
(soh) 1 0001 0x01 | ! 33 0041 0x21 | A 65 0101 0x41 | a 97 0141 0x61
(stx) 2 0002 0x02 | " 34 0042 0x22 | B 66 0102 0x42 | b 98 0142 0x62
(etx) 3 0003 0x03 | # 35 0043 0x23 | C 67 0103 0x43 | c 99 0143 0x63
(eot) 4 0004 0x04 | $ 36 0044 0x24 | D 68 0104 0x44 | d 100 0144 0x64
(enq) 5 0005 0x05 | % 37 0045 0x25 | E 69 0105 0x45 | e 101 0145 0x65
(ack) 6 0006 0x06 | & 38 0046 0x26 | F 70 0106 0x46 | f 102 0146 0x66
(bel) 7 0007 0x07 | ' 39 0047 0x27 | G 71 0107 0x47 | g 103 0147 0x67
(bs) 8 0010 0x08 | ( 40 0050 0x28 | H 72 0110 0x48 | h 104 0150 0x68
(ht) 9 0011 0x09 | ) 41 0051 0x29 | I 73 0111 0x49 | i 105 0151 0x69
(nl) 10 0012 0x0a | * 42 0052 0x2a | J 74 0112 0x4a | j 106 0152 0x6a
(vt) 11 0013 0x0b | + 43 0053 0x2b | K 75 0113 0x4b | k 107 0153 0x6b
(np) 12 0014 0x0c | , 44 0054 0x2c | L 76 0114 0x4c | l 108 0154 0x6c
(cr) 13 0015 0x0d | - 45 0055 0x2d | M 77 0115 0x4d | m 109 0155 0x6d
(so) 14 0016 0x0e | . 46 0056 0x2e | N 78 0116 0x4e | n 110 0156 0x6e
(si) 15 0017 0x0f | / 47 0057 0x2f | O 79 0117 0x4f | o 111 0157 0x6f
(dle) 16 0020 0x10 | 0 48 0060 0x30 | P 80 0120 0x50 | p 112 0160 0x70
(dc1) 17 0021 0x11 | 1 49 0061 0x31 | Q 81 0121 0x51 | q 113 0161 0x71
(dc2) 18 0022 0x12 | 2 50 0062 0x32 | R 82 0122 0x52 | r 114 0162 0x72
(dc3) 19 0023 0x13 | 3 51 0063 0x33 | S 83 0123 0x53 | s 115 0163 0x73
(dc4) 20 0024 0x14 | 4 52 0064 0x34 | T 84 0124 0x54 | t 116 0164 0x74
(nak) 21 0025 0x15 | 5 53 0065 0x35 | U 85 0125 0x55 | u 117 0165 0x75
(syn) 22 0026 0x16 | 6 54 0066 0x36 | V 86 0126 0x56 | v 118 0166 0x76
(etb) 23 0027 0x17 | 7 55 0067 0x37 | W 87 0127 0x57 | w 119 0167 0x77
(can) 24 0030 0x18 | 8 56 0070 0x38 | X 88 0130 0x58 | xYes, I know that, but I don't know who gave you the 1, take a 5 for your efforts.
-
I was looking at sorting some strings today and noticed that the default comparer orders them wrong -- it sorts "a" before "A", rather than after it, for instance. So I looked into it and read up a bit and tried the InvariantCulture and the Ordinal comparer -- Ordinal works correctly, and I'll likely use it, but I'm not happy about it. According to http://msdn.microsoft.com/en-us/library/dd465121.aspx[^]: "Comparisons that use StringComparison.InvariantCulture and StringComparison.Ordinal work identically on ASCII strings." Which is not strictly true -- they order these differently. Does anyone here know how to make a Culture (based on en-US) that does a case-sensitive sort the right way? Edit: This is interesting... I was experimenting with the System.StringComparer.CurrentCultureIgnoreCase and discovered that it seems to do what I want -- at least in the tests I've made. Which is good. :thumbsup: But that means that it doesn't actually ignore the case! :sigh: Scratch that -- it's just an artifact of how I was testing it.
modified on Friday, August 19, 2011 2:13 PM
What I came up with as a simple interim solution is this:
private sealed class MyComparer : System.Collections.Generic.IComparer
{
public int
Compare
(
string Op0
,
string Op1
)
{
int result = System.StringComparer.InvariantCultureIgnoreCase.Compare ( Op0 , Op1 ) ;if ( result == 0 ) { result = System.StringComparer.InvariantCulture.Compare ( Op0 , Op1 ) \* -1 ; } return ( result ) ; }
}
-
Yes, I know that, but I don't know who gave you the 1, take a 5 for your efforts.
PIEBALDconsult wrote:
I don't know who gave you the 1
I get downvoted all the time and the people who do it do not have the courage to own up and explain it.
PIEBALDconsult wrote:
take a 5 for your efforts
Thanks :-)
"Don't confuse experts with facts" - Eric_V