Dictionary<K,V> performance based on Key Type
-
I tried something like this:
class _Key
{
public string s;
public Type t;public override int GetHashCode() { return s.GetHashCode() ^ t.GetHashCode(); } public override bool Equals(object obj) { \_Key o = obj as \_Key; if (o == null) return false; if (s == o.s && t == o.t) return true; return false; } }
and it yielded 60ms. Seems kinda pricey still. Compared to the Type key alone (20ms). EDIT: tried something else... made the key an int and called GetHashCode() on the object to get the key. That knocked it down to 40ms. EDIT #2: tried just returning the GetHashCode() on one of the members. Type = 50ms, string = 50ms. Not really sure of a better way to implement GetHashCode(). Any suggestions?
-
I tried something like this:
class _Key
{
public string s;
public Type t;public override int GetHashCode() { return s.GetHashCode() ^ t.GetHashCode(); } public override bool Equals(object obj) { \_Key o = obj as \_Key; if (o == null) return false; if (s == o.s && t == o.t) return true; return false; } }
and it yielded 60ms. Seems kinda pricey still. Compared to the Type key alone (20ms). EDIT: tried something else... made the key an int and called GetHashCode() on the object to get the key. That knocked it down to 40ms. EDIT #2: tried just returning the GetHashCode() on one of the members. Type = 50ms, string = 50ms. Not really sure of a better way to implement GetHashCode(). Any suggestions?
If you want your gethashcode method to be faster than the inbuilt one you'll need to a.) Make sure that collisions are as rare as possible for your specific data, this can be done via implementing a custom hashing algorithm to suit your data b.) Ensure that GetHashCode is very fast in terms of cycles Because you're just calling the GetHashCode on string and type above. It's going to be no faster or better than the inbuilt ones. You'll need to come up with a hashing scheme that's faster and stronger (Less frequent collisions) than the inbuilt ones.
-
So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).
-
You might also want to post the code for your benchmark so we can check if it's valid It's very easy to mess up tests like this :)
I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.
-
I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.
It will probably be difficult to beat the inbuilt hash code functions for the general case however there might be something about your specific data that you can exploit to make a stronger hash function Example: If you know the first 4 chars of your strings are nearly always unique, you could just turn those into the 32 bit hash. This would save .NET framework from hashing the entire string so it will be faster. It will also be stronger because you used knowledge specific to your data (Eg, the first 4 chars are enough) If there are any features of your keys like above you can exploit. The chances of doing better than the inbuilt stuff will be much better Cheers Matt
-
I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.
Your test is invalid
for (int i = 0; i < 1000000; i++)
{
// 10ms - int x string
string s;
dict.TryGetValue(5, out s);
}You never use the value of string 's' inside the loop. This means the compiler can optimize it out. Unless your running it in debug mode with debugger attached The performance you get running in release mode without debugger attached can be massively different This test is to trivial to be good measure of real performance. Also your looking up the same key every time '5'. The performance can change significantly depending on what key you're looking up
-
Your test is invalid
for (int i = 0; i < 1000000; i++)
{
// 10ms - int x string
string s;
dict.TryGetValue(5, out s);
}You never use the value of string 's' inside the loop. This means the compiler can optimize it out. Unless your running it in debug mode with debugger attached The performance you get running in release mode without debugger attached can be massively different This test is to trivial to be good measure of real performance. Also your looking up the same key every time '5'. The performance can change significantly depending on what key you're looking up
Yes, I was running in debug with the debugger attached. I understand it is a trivial / not full scope test. However, I did the same test on all the various types I mentioned to get a *rough* idea of the differences before I implemented the real thing. I think seeing that an int was 10ms and a struct was 590ms is pretty indicative the struct is not a good solution (without overriding the IsEquals and GetHashCode methods). I did try hack in the _Key struct with the overridden IsEquals and GetHashCode methods into my real test application & class that does a lot with the values. I did not use the int key hacky thing I was questioning as that wouldn't work. My original dictionary took 110ms to run 1M iterations with the debugger attached and 3 items in the dictionary in the real application. Switching it to _Key and properly initializing the struct and the real hash code method bumped it up to 140ms to 150ms. So it actually added 30ms to 40ms of overhead which is what my trivial benchmarks kinda showed it would with 1M iterations. FYI: Just for fun, I tried commenting out the IsEquals and GetHashCode overrides and it slowed to a crawl @ 1390ms! Wow. That is a lot more then I thought it would.
-
Yes, I was running in debug with the debugger attached. I understand it is a trivial / not full scope test. However, I did the same test on all the various types I mentioned to get a *rough* idea of the differences before I implemented the real thing. I think seeing that an int was 10ms and a struct was 590ms is pretty indicative the struct is not a good solution (without overriding the IsEquals and GetHashCode methods). I did try hack in the _Key struct with the overridden IsEquals and GetHashCode methods into my real test application & class that does a lot with the values. I did not use the int key hacky thing I was questioning as that wouldn't work. My original dictionary took 110ms to run 1M iterations with the debugger attached and 3 items in the dictionary in the real application. Switching it to _Key and properly initializing the struct and the real hash code method bumped it up to 140ms to 150ms. So it actually added 30ms to 40ms of overhead which is what my trivial benchmarks kinda showed it would with 1M iterations. FYI: Just for fun, I tried commenting out the IsEquals and GetHashCode overrides and it slowed to a crawl @ 1390ms! Wow. That is a lot more then I thought it would.
No probs. Was just making sure you're aware that release mode + no debugger can make substantial impact on simple synthetic tests like this. But sounds like you are :) I can't think of any other obvious way to improve the lookup performance further. The inbuilt dictionary is pretty fast. It's often difficult to beat. I've made some dictionary like structures that are faster GenericHashTrie for example but they have significant downsides compared to the generic dictionary.
-
So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).
If you're really after performance - would it be possible not to use a dictionary at all?
Regards, Rob Philpott.
-
I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.
Don't use
DateTime
for measuring performance; use theStopwatch
class[^] instead. As you suspect, your "breakthrough" won't work. If you use the result ofGetHashCode
as a key, you can incorrectly consider two different keys to be the same due to hash code collisions. You could try making an immutable key class and caching the hash code:sealed class _Key : IEquatable<Key>
{
private readonly string _s;
private readonly Type _t;
private readonly int _hashCode;public \_Key(string s, Type t) { if (s == null) throw new ArgumentNullException("s"); if (t == null) throw new ArgumentNullException("t"); \_s = s; \_t = t; unchecked { \_hashCode = (s.GetHashCode() \* 397) ^ t.GetHashCode(); } } public string S { get { return \_s; } } public Type T { get { return \_t; } } public override int GetHashCode() { return \_hashCode; } public override bool Equals(object obj) { return Equals(obj as \_Key); } public bool Equals(\_Key other) { if (ReferenceEquals(other, null)) return false; return \_s == other.\_s && \_t == other.\_t; } public static bool operator ==(\_Key left, \_Key right) { return Equals(left, right); } public static bool operator !=(\_Key left, \_Key right) { return !Equals(left, right); }
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer