Hashtable vs Dictionary.
-
http://www.microsoft.com/belux/msdn/fr/community/columns/jtielens/collections1.mspx[^]
Thanks and Regards Sandeep If If you look at what you do not have in life, you don't have anything, If you look at what you have in life, you have everything... "
-
If you want to know which is quicker, why not write some tests. This is the best way for you to find out because you won't end up relying on what somebody else tells you, which could be wrong.
Deja View - the feeling that you've seen this post before.
-
Maybe 50 lines (or less!) would provide you with insertion/retrieval times for both. Just for fun, I used the System.Diagnostics.Stopwatch class to get the elapsed time on adding 10,000,000 ints to both a HashTable and a Dictionary and found that the Dictionary was faster both for insertion and retrieval. I just for-looped for the desired number of iterations and added the counter each time to the container. On the retrieval, I just did a [key] lookup and an assigment, something like: myint = container[key]; On the hashtable part, I did a Convert.To.Int32() call. On the Dictionary, it *knew* it was an Int already, so I just did the assignment. So, that Convert call took some time. I went back and took out the Convert call to see what impact that was having - not much, actually. Maybe 100-150 milliseconds on average for the entire run. Interestingly (at least to me), the Hashtable was slower on retrieval when I specified the initial capacity at construction than when I didn't. For 10,000,000 items, it took about 700 milliseconds or so longer for the entire run. Of course, your mileage may vary and my test program was not an exhaustive one. That was as fun diversion - back to work I go! Have fun, Matt
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
-
Maybe 50 lines (or less!) would provide you with insertion/retrieval times for both. Just for fun, I used the System.Diagnostics.Stopwatch class to get the elapsed time on adding 10,000,000 ints to both a HashTable and a Dictionary and found that the Dictionary was faster both for insertion and retrieval. I just for-looped for the desired number of iterations and added the counter each time to the container. On the retrieval, I just did a [key] lookup and an assigment, something like: myint = container[key]; On the hashtable part, I did a Convert.To.Int32() call. On the Dictionary, it *knew* it was an Int already, so I just did the assignment. So, that Convert call took some time. I went back and took out the Convert call to see what impact that was having - not much, actually. Maybe 100-150 milliseconds on average for the entire run. Interestingly (at least to me), the Hashtable was slower on retrieval when I specified the initial capacity at construction than when I didn't. For 10,000,000 items, it took about 700 milliseconds or so longer for the entire run. Of course, your mileage may vary and my test program was not an exhaustive one. That was as fun diversion - back to work I go! Have fun, Matt
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
Matthew Cuba wrote:
Interestingly (at least to me), the Hashtable was slower on retrieval when I specified the initial capacity at construction than when I didn't. For 10,000,000 items, it took about 700 milliseconds or so longer for the entire run.
This is probably something to do with the fact that Hashtables have to box/unbox their data (both key AND value), whereas a Dictionary is strongly-typed.
-
Matthew Cuba wrote:
Interestingly (at least to me), the Hashtable was slower on retrieval when I specified the initial capacity at construction than when I didn't. For 10,000,000 items, it took about 700 milliseconds or so longer for the entire run.
This is probably something to do with the fact that Hashtables have to box/unbox their data (both key AND value), whereas a Dictionary is strongly-typed.
Jason Hanford-Smith wrote:
This is probably something to do with the fact that Hashtables have to box/unbox their data (both key AND value), whereas a Dictionary is strongly-typed.
Not sure I follow you on this. It was my understanding that retrieval of an element from a Hashtable is O(1). In both instances (whether I set the initial capacity or not), I've added 10,000,000 items to the Hashtable. I expected it to take longer to hashtable.add to the one where I didn't specify an initial capacity, as the Add is O(n) if Count>Capacity (it is O(1) otherwise), but on retrieval... well, I'd expect similar box/unbox activity to be done on the Hashtable in both instances.
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
-
Jason Hanford-Smith wrote:
This is probably something to do with the fact that Hashtables have to box/unbox their data (both key AND value), whereas a Dictionary is strongly-typed.
Not sure I follow you on this. It was my understanding that retrieval of an element from a Hashtable is O(1). In both instances (whether I set the initial capacity or not), I've added 10,000,000 items to the Hashtable. I expected it to take longer to hashtable.add to the one where I didn't specify an initial capacity, as the Add is O(n) if Count>Capacity (it is O(1) otherwise), but on retrieval... well, I'd expect similar box/unbox activity to be done on the Hashtable in both instances.
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
If you take a look at the definition for Add and Item in Hashtable you'll see:
public virtual Object this[Object key] <-- Item for you VB'ers public virtual void Add(Object key, Object value)
Both getting from and adding to the Hashtable involves boxing or unboxing. On the other hand, with Dictionary you have:public TValue this[TKey key] <-- Item for you VB'ers public void Add(TKey key, TValue value)
As you can see, this uses Generics and so does not involve a type conversion or boxing/unboxing penalty. -
If you take a look at the definition for Add and Item in Hashtable you'll see:
public virtual Object this[Object key] <-- Item for you VB'ers public virtual void Add(Object key, Object value)
Both getting from and adding to the Hashtable involves boxing or unboxing. On the other hand, with Dictionary you have:public TValue this[TKey key] <-- Item for you VB'ers public void Add(TKey key, TValue value)
As you can see, this uses Generics and so does not involve a type conversion or boxing/unboxing penalty.Yes, I believe I understand the idea that the Dictionary knows the type already and thus doesn't have the boxing/unboxing overhead. But, my original question was "Why would the cumulative time to retrieve N items from a Hashtable that had been declared: Hashtable ht = new Hashtable(N); be longer than for one declared as follows: Hashtable ht = new Hashtable();?" That is what I was wondering. It would seem to me that since they are both Hashtables and both have N number of items in them, added in the same manner, the retrieval time would be very similar, but it wasn't. It actually took longer for the one where I specified the capacity upfront. Incidentally, the time to retrieve for the Dictionary was essentially the same whether or not I specified an initial capacity or not.
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
-
Yes, I believe I understand the idea that the Dictionary knows the type already and thus doesn't have the boxing/unboxing overhead. But, my original question was "Why would the cumulative time to retrieve N items from a Hashtable that had been declared: Hashtable ht = new Hashtable(N); be longer than for one declared as follows: Hashtable ht = new Hashtable();?" That is what I was wondering. It would seem to me that since they are both Hashtables and both have N number of items in them, added in the same manner, the retrieval time would be very similar, but it wasn't. It actually took longer for the one where I specified the capacity upfront. Incidentally, the time to retrieve for the Dictionary was essentially the same whether or not I specified an initial capacity or not.
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?
Apologies, my mistake. Looking at it further (using Reflector), the Hashtable does some interesting things with the load factor provided that will force an automatic resize if the number of "buckets" used comes close the the number of slots available. To test this, how about having two Hashtables using to two different initial sizes and timing the add/retrieve speed of each up to the smaller count. The speeds should be the same. I think you'll only see the degredation when you go beyond it... it'll keep expanding to the next nearest prime above two times the current count. Which sounds like a big jump, but you'd probably hit it again pretty quick on a 10,000,000 loop. Hope some (any) of that made sense.
-
Apologies, my mistake. Looking at it further (using Reflector), the Hashtable does some interesting things with the load factor provided that will force an automatic resize if the number of "buckets" used comes close the the number of slots available. To test this, how about having two Hashtables using to two different initial sizes and timing the add/retrieve speed of each up to the smaller count. The speeds should be the same. I think you'll only see the degredation when you go beyond it... it'll keep expanding to the next nearest prime above two times the current count. Which sounds like a big jump, but you'd probably hit it again pretty quick on a 10,000,000 loop. Hope some (any) of that made sense.
No problem whatsoever. You made perfect sense. I'm going to play around with the suggestions you've made here and I appreciate the responses and info. It is all quite interesting. Take care, Matt
It isn't enough to do well in life. One must do good when and where one can. Otherwise, what's the point?