Using HASH Tables. Looking for general discussion on this topic.
-
I say: use libraries. Unless, maybe, the domain is embedded with required extremely small footprint.
"If we don't change direction, we'll end up where we're going"
megaadam wrote:
Unless, maybe, the domain is embedded with required extremely small footprint.
Welcome to my world.
GCS/GE d--(d) s-/+ a C+++ U+++ P-- L+@ E-- W+++ N+ o+ K- w+++ O? M-- V? PS+ PE Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X The shortest horror story: On Error Resume Next
-
I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.
"A little time, a little trouble, your better day" Badfinger
Use code tags.
jmaida wrote:
what works for you?
Two of your examples use strings as the key. However the first would appear to be a fixed set. You could attempt to optimize based on that set. I have done so in the past to achieve zero collisions. However micro optimizations based on guessing is a waste of time. Optimize based on profiling the application using realistic data. (My example above for zero collisions was in fact a waste of time.) If I was using C or C++ I would use an existing library. Your code example is mixing the hash value with the hash table which works for very limited cases but in general the two should be distinct (thus the library.) Recalculating the hash every single time might not be ideal. But avoiding that means using a more complex structure.
jmaida wrote:
Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time
I do not understand that statement. Hash table and binary tree are distinct data structures. You can replace one with the other but there are considerations for both which your statement does not make clear to me. I do know that I replaced a complex tree (not a normal binary tree) with a hash table and gained about a 30% speed improvement so perhaps you are referring to something like that.
-
I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.
"A little time, a little trouble, your better day" Badfinger
Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]
-
Use code tags.
jmaida wrote:
what works for you?
Two of your examples use strings as the key. However the first would appear to be a fixed set. You could attempt to optimize based on that set. I have done so in the past to achieve zero collisions. However micro optimizations based on guessing is a waste of time. Optimize based on profiling the application using realistic data. (My example above for zero collisions was in fact a waste of time.) If I was using C or C++ I would use an existing library. Your code example is mixing the hash value with the hash table which works for very limited cases but in general the two should be distinct (thus the library.) Recalculating the hash every single time might not be ideal. But avoiding that means using a more complex structure.
jmaida wrote:
Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time
I do not understand that statement. Hash table and binary tree are distinct data structures. You can replace one with the other but there are considerations for both which your statement does not make clear to me. I do know that I replaced a complex tree (not a normal binary tree) with a hash table and gained about a 30% speed improvement so perhaps you are referring to something like that.
What I meant to say is If one does not required minimum lookup time (it's my understanding though may be wrong, that a balanced binary tree can provide a minimum lookup time), then hashing is an inexpensive alternative.
"A little time, a little trouble, your better day" Badfinger
-
Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]
-
Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]
-
After searching for a good hash for strings, I settled on the following:
uint32_t string_hash(const char* s)
{
uint64_t hash = 0;
auto size = strlen(s);for(size_t i = 0; i < size; ++i)
{
hash = s[i] + (hash << 16) + (hash << 6) - hash;
}return hash;
}And then you truncate the result to be a valid index into your hash table.
Robust Services Core | Software Techniques for Lemmings | Articles
The fox knows many things, but the hedgehog knows one big thing. -
Worked the best using random generated names. Generated 100 hash values with only 2 collisions. not bad. I'll call it the GREG UTAS HASH
"A little time, a little trouble, your better day" Badfinger
It's not mine! I found it somewhere on the net but don't recall where. EDIT: Sorry for just saying "After searching...". Now I see how it can be misinterpreted.
Robust Services Core | Software Techniques for Lemmings | Articles
The fox knows many things, but the hedgehog knows one big thing. -
Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]
-
It's not mine! I found it somewhere on the net but don't recall where. EDIT: Sorry for just saying "After searching...". Now I see how it can be misinterpreted.
Robust Services Core | Software Techniques for Lemmings | Articles
The fox knows many things, but the hedgehog knows one big thing.I am giving you credit for funding it. I have 3 variations of hash functions and it's the best so far. One day I will post them, but too much going on here. Our weather here has gone frigid (for us) going into the teens. Trying to protect plants and such.
"A little time, a little trouble, your better day" Badfinger