Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Dictionary<K,V> performance based on Key Type

Dictionary<K,V> performance based on Key Type

Scheduled Pinned Locked Moved C#
testingbeta-testingperformancequestion
12 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    SledgeHammer01
    wrote on last edited by
    #1

    So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).

    M R 3 Replies Last reply
    0
    • S SledgeHammer01

      So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).

      M Offline
      M Offline
      Matty22
      wrote on last edited by
      #2

      Try implementing your own fast ToHash and Equals methods perhaps

      S 1 Reply Last reply
      0
      • M Matty22

        Try implementing your own fast ToHash and Equals methods perhaps

        S Offline
        S Offline
        SledgeHammer01
        wrote on last edited by
        #3

        I tried something like this:

        class _Key
        {
        public string s;
        public Type t;

        	public override int GetHashCode()
        	{
        		return s.GetHashCode() ^ t.GetHashCode();
        	}
        
        	public override bool Equals(object obj)
        	{
        		\_Key o = obj as \_Key;
        
        		if (o == null)
        			return false;
        
        		if (s == o.s && t == o.t)
        			return true;
        
        		return false;
        	}
        }
        

        and it yielded 60ms. Seems kinda pricey still. Compared to the Type key alone (20ms). EDIT: tried something else... made the key an int and called GetHashCode() on the object to get the key. That knocked it down to 40ms. EDIT #2: tried just returning the GetHashCode() on one of the members. Type = 50ms, string = 50ms. Not really sure of a better way to implement GetHashCode(). Any suggestions?

        M 1 Reply Last reply
        0
        • S SledgeHammer01

          I tried something like this:

          class _Key
          {
          public string s;
          public Type t;

          	public override int GetHashCode()
          	{
          		return s.GetHashCode() ^ t.GetHashCode();
          	}
          
          	public override bool Equals(object obj)
          	{
          		\_Key o = obj as \_Key;
          
          		if (o == null)
          			return false;
          
          		if (s == o.s && t == o.t)
          			return true;
          
          		return false;
          	}
          }
          

          and it yielded 60ms. Seems kinda pricey still. Compared to the Type key alone (20ms). EDIT: tried something else... made the key an int and called GetHashCode() on the object to get the key. That knocked it down to 40ms. EDIT #2: tried just returning the GetHashCode() on one of the members. Type = 50ms, string = 50ms. Not really sure of a better way to implement GetHashCode(). Any suggestions?

          M Offline
          M Offline
          Matty22
          wrote on last edited by
          #4

          If you want your gethashcode method to be faster than the inbuilt one you'll need to a.) Make sure that collisions are as rare as possible for your specific data, this can be done via implementing a custom hashing algorithm to suit your data b.) Ensure that GetHashCode is very fast in terms of cycles Because you're just calling the GetHashCode on string and type above. It's going to be no faster or better than the inbuilt ones. You'll need to come up with a hashing scheme that's faster and stronger (Less frequent collisions) than the inbuilt ones.

          1 Reply Last reply
          0
          • S SledgeHammer01

            So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).

            M Offline
            M Offline
            Matty22
            wrote on last edited by
            #5

            You might also want to post the code for your benchmark so we can check if it's valid It's very easy to mess up tests like this :)

            S 1 Reply Last reply
            0
            • M Matty22

              You might also want to post the code for your benchmark so we can check if it's valid It's very easy to mess up tests like this :)

              S Offline
              S Offline
              SledgeHammer01
              wrote on last edited by
              #6

              I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.

              M Richard DeemingR 3 Replies Last reply
              0
              • S SledgeHammer01

                I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.

                M Offline
                M Offline
                Matty22
                wrote on last edited by
                #7

                It will probably be difficult to beat the inbuilt hash code functions for the general case however there might be something about your specific data that you can exploit to make a stronger hash function Example: If you know the first 4 chars of your strings are nearly always unique, you could just turn those into the 32 bit hash. This would save .NET framework from hashing the entire string so it will be faster. It will also be stronger because you used knowledge specific to your data (Eg, the first 4 chars are enough) If there are any features of your keys like above you can exploit. The chances of doing better than the inbuilt stuff will be much better Cheers Matt

                1 Reply Last reply
                0
                • S SledgeHammer01

                  I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.

                  M Offline
                  M Offline
                  Matty22
                  wrote on last edited by
                  #8

                  Your test is invalid

                  for (int i = 0; i < 1000000; i++)
                  {
                  // 10ms - int x string
                  string s;
                  dict.TryGetValue(5, out s);
                  }

                  You never use the value of string 's' inside the loop. This means the compiler can optimize it out. Unless your running it in debug mode with debugger attached The performance you get running in release mode without debugger attached can be massively different This test is to trivial to be good measure of real performance. Also your looking up the same key every time '5'. The performance can change significantly depending on what key you're looking up

                  S 1 Reply Last reply
                  0
                  • M Matty22

                    Your test is invalid

                    for (int i = 0; i < 1000000; i++)
                    {
                    // 10ms - int x string
                    string s;
                    dict.TryGetValue(5, out s);
                    }

                    You never use the value of string 's' inside the loop. This means the compiler can optimize it out. Unless your running it in debug mode with debugger attached The performance you get running in release mode without debugger attached can be massively different This test is to trivial to be good measure of real performance. Also your looking up the same key every time '5'. The performance can change significantly depending on what key you're looking up

                    S Offline
                    S Offline
                    SledgeHammer01
                    wrote on last edited by
                    #9

                    Yes, I was running in debug with the debugger attached. I understand it is a trivial / not full scope test. However, I did the same test on all the various types I mentioned to get a *rough* idea of the differences before I implemented the real thing. I think seeing that an int was 10ms and a struct was 590ms is pretty indicative the struct is not a good solution (without overriding the IsEquals and GetHashCode methods). I did try hack in the _Key struct with the overridden IsEquals and GetHashCode methods into my real test application & class that does a lot with the values. I did not use the int key hacky thing I was questioning as that wouldn't work. My original dictionary took 110ms to run 1M iterations with the debugger attached and 3 items in the dictionary in the real application. Switching it to _Key and properly initializing the struct and the real hash code method bumped it up to 140ms to 150ms. So it actually added 30ms to 40ms of overhead which is what my trivial benchmarks kinda showed it would with 1M iterations. FYI: Just for fun, I tried commenting out the IsEquals and GetHashCode overrides and it slowed to a crawl @ 1390ms! Wow. That is a lot more then I thought it would.

                    M 1 Reply Last reply
                    0
                    • S SledgeHammer01

                      Yes, I was running in debug with the debugger attached. I understand it is a trivial / not full scope test. However, I did the same test on all the various types I mentioned to get a *rough* idea of the differences before I implemented the real thing. I think seeing that an int was 10ms and a struct was 590ms is pretty indicative the struct is not a good solution (without overriding the IsEquals and GetHashCode methods). I did try hack in the _Key struct with the overridden IsEquals and GetHashCode methods into my real test application & class that does a lot with the values. I did not use the int key hacky thing I was questioning as that wouldn't work. My original dictionary took 110ms to run 1M iterations with the debugger attached and 3 items in the dictionary in the real application. Switching it to _Key and properly initializing the struct and the real hash code method bumped it up to 140ms to 150ms. So it actually added 30ms to 40ms of overhead which is what my trivial benchmarks kinda showed it would with 1M iterations. FYI: Just for fun, I tried commenting out the IsEquals and GetHashCode overrides and it slowed to a crawl @ 1390ms! Wow. That is a lot more then I thought it would.

                      M Offline
                      M Offline
                      Matty22
                      wrote on last edited by
                      #10

                      No probs. Was just making sure you're aware that release mode + no debugger can make substantial impact on simple synthetic tests like this. But sounds like you are :) I can't think of any other obvious way to improve the lookup performance further. The inbuilt dictionary is pretty fast. It's often difficult to beat. I've made some dictionary like structures that are faster GenericHashTrie for example but they have significant downsides compared to the generic dictionary.

                      1 Reply Last reply
                      0
                      • S SledgeHammer01

                        So I did some performance testing on Dictionary based on different key types (1M iterations): 1) int, string = 10ms 2) Type, string = 20ms 3) string, string = 40ms 4) Tuple, string = 120ms!!! 5) struct, string = 590ms!!! Right now I'm using #2 in a performance critical loop... but I really need something like #4, but the performance overhead is just not acceptable. Anybody got any ideas for a multi-key dictionary that isn't going to kill my performance? My only idea so far is to get kinda hacky and go with #3 and append to string & Type.ToString() together. I.e. "Bob;System.Int32". I'd really like to avoid hackyness though as this is actually code I care about lol :).

                        R Offline
                        R Offline
                        Rob Philpott
                        wrote on last edited by
                        #11

                        If you're really after performance - would it be possible not to use a dictionary at all?

                        Regards, Rob Philpott.

                        1 Reply Last reply
                        0
                        • S SledgeHammer01

                          I just did: Dictionary dict = new Dictionary(); dict[5] = "Test"; DateTime dt = DateTime.Now; for (int i = 0; i < 1000000; i++) { // 10ms - int x string string s; dict.TryGetValue(5, out s); } System.Diagnostics.Debug.WriteLine((DateTime.Now - dt).TotalMilliseconds); It's really that simple. I know that's only 1 key, so my mileage may vary with a bunch of keys. I tried that same thing with various types to get the benchmarks I originally posted. HOWEVER, I had a brilliant break through haha. Would something like this work? Dictionary dict; struct _Key { Type type; string str; } Now, instead of overriding the GetHashCode and IsEquals, I just have a method: public int GetKey(string s, Type t) { return s.GetHashCode() ^ t.GetHashCode(); } whenever I want to insert a new object, I call GetKey() on it and use that as the key? As I'm typing this, I'm beginning to poke holes in this idea... there would be no way to retrieve the string and type as I would just be keying off the hash code. I'm also wondering if it would be possible to get two s and t combinations that produce the same key? Theoretically, I'm assuming the .net hash functions are strong. Although since I wouldn't be inserting the _Key struct into the list (and thus not override the IsEqual(), the dictionary wouldn't know for sure it was grabbing the right one... If I do it "right" and insert the _Key, that's 60ms... better then 120ms or 590ms I guess.

                          Richard DeemingR Offline
                          Richard DeemingR Offline
                          Richard Deeming
                          wrote on last edited by
                          #12

                          Don't use DateTime for measuring performance; use the Stopwatch class[^] instead. As you suspect, your "breakthrough" won't work. If you use the result of GetHashCode as a key, you can incorrectly consider two different keys to be the same due to hash code collisions. You could try making an immutable key class and caching the hash code:

                          sealed class _Key : IEquatable<Key>
                          {
                          private readonly string _s;
                          private readonly Type _t;
                          private readonly int _hashCode;

                          public \_Key(string s, Type t)
                          {
                              if (s == null) throw new ArgumentNullException("s");
                              if (t == null) throw new ArgumentNullException("t");
                          
                              \_s = s;
                              \_t = t;
                          
                              unchecked
                              {
                                  \_hashCode = (s.GetHashCode() \* 397) ^ t.GetHashCode();
                              }
                          }
                          
                          public string S
                          {
                              get { return \_s; }
                          }
                          
                          public Type T
                          {
                              get { return \_t; }
                          }
                          
                          public override int GetHashCode()
                          {
                              return \_hashCode;
                          }
                          
                          public override bool Equals(object obj)
                          {
                              return Equals(obj as \_Key);
                          }
                          
                          public bool Equals(\_Key other)
                          {
                              if (ReferenceEquals(other, null)) return false;
                              return \_s == other.\_s && \_t == other.\_t;
                          }
                          
                          public static bool operator ==(\_Key left, \_Key right)
                          {
                              return Equals(left, right);
                          }
                          
                          public static bool operator !=(\_Key left, \_Key right)
                          {
                              return !Equals(left, right);
                          }
                          

                          }


                          "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                          "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups