Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Using HASH Tables. Looking for general discussion on this topic.

Using HASH Tables. Looking for general discussion on this topic.

Scheduled Pinned Locked Moved The Lounge
questiondata-structurescryptographydiscussionlounge
15 Posts 7 Posters 2 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jmaida
    wrote on last edited by
    #1

    I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

    "A little time, a little trouble, your better day" Badfinger

    P Greg UtasG M J B 5 Replies Last reply
    0
    • J jmaida

      I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

      "A little time, a little trouble, your better day" Badfinger

      P Offline
      P Offline
      Peter_in_2780
      wrote on last edited by
      #2

      The important point is that there is no universal "best" hash function. What makes a "good" hash (i.e. minimal collisions, not too costly to compute) depends very much on the nature of the data you are hashing. A good hash for peoples' full names may not do too well on their SSNs or phone numbers for example. And of course, the size of your hash table has a huge influence on performance. There are lots of good discussions (and some not so good) to be found if you search things like "best hash function".

      Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

      J 1 Reply Last reply
      0
      • P Peter_in_2780

        The important point is that there is no universal "best" hash function. What makes a "good" hash (i.e. minimal collisions, not too costly to compute) depends very much on the nature of the data you are hashing. A good hash for peoples' full names may not do too well on their SSNs or phone numbers for example. And of course, the size of your hash table has a huge influence on performance. There are lots of good discussions (and some not so good) to be found if you search things like "best hash function".

        Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

        J Offline
        J Offline
        jmaida
        wrote on last edited by
        #3

        Thanx for quick response. This is a good start. Agreed: "No universal "best" hash function" "Minimize collisions" "lots of discussions on "best hash function", etc "What is nature of the data to be hashed?", etc However, I suspect that google uses proprietary hashing techniques caching searches for future searches, "who knows", collision mitigation, daisy chaining (nested hash tables, etc. my terminology) I suspect this subject maybe a lot deeper than the search for the standard "best hash function". hence my proposal for a discussion. It may come to nothing, but it's out there.

        "A little time, a little trouble, your better day" Badfinger

        1 Reply Last reply
        0
        • J jmaida

          I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

          "A little time, a little trouble, your better day" Badfinger

          Greg UtasG Offline
          Greg UtasG Offline
          Greg Utas
          wrote on last edited by
          #4

          After searching for a good hash for strings, I settled on the following:

          uint32_t string_hash(const char* s)
          {
          uint64_t hash = 0;
          auto size = strlen(s);

          for(size_t i = 0; i < size; ++i)
          {
          hash = s[i] + (hash << 16) + (hash << 6) - hash;
          }

          return hash;
          }

          And then you truncate the result to be a valid index into your hash table.

          Robust Services Core | Software Techniques for Lemmings | Articles
          The fox knows many things, but the hedgehog knows one big thing.

          <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
          <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

          J 1 Reply Last reply
          0
          • J jmaida

            I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

            "A little time, a little trouble, your better day" Badfinger

            M Offline
            M Offline
            megaadam
            wrote on last edited by
            #5

            I say: use libraries. Unless, maybe, the domain is embedded with required extremely small footprint.

            "If we don't change direction, we'll end up where we're going"

            D 1 Reply Last reply
            0
            • M megaadam

              I say: use libraries. Unless, maybe, the domain is embedded with required extremely small footprint.

              "If we don't change direction, we'll end up where we're going"

              D Offline
              D Offline
              den2k88
              wrote on last edited by
              #6

              megaadam wrote:

              Unless, maybe, the domain is embedded with required extremely small footprint.

              Welcome to my world.

              GCS/GE d--(d) s-/+ a C+++ U+++ P-- L+@ E-- W+++ N+ o+ K- w+++ O? M-- V? PS+ PE Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++*      Weapons extension: ma- k++ F+2 X The shortest horror story: On Error Resume Next

              1 Reply Last reply
              0
              • J jmaida

                I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

                "A little time, a little trouble, your better day" Badfinger

                J Offline
                J Offline
                jschell
                wrote on last edited by
                #7

                Use code tags.

                jmaida wrote:

                what works for you?

                Two of your examples use strings as the key. However the first would appear to be a fixed set. You could attempt to optimize based on that set. I have done so in the past to achieve zero collisions. However micro optimizations based on guessing is a waste of time. Optimize based on profiling the application using realistic data. (My example above for zero collisions was in fact a waste of time.) If I was using C or C++ I would use an existing library. Your code example is mixing the hash value with the hash table which works for very limited cases but in general the two should be distinct (thus the library.) Recalculating the hash every single time might not be ideal. But avoiding that means using a more complex structure.

                jmaida wrote:

                Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time

                I do not understand that statement. Hash table and binary tree are distinct data structures. You can replace one with the other but there are considerations for both which your statement does not make clear to me. I do know that I replaced a complex tree (not a normal binary tree) with a hash table and gained about a 30% speed improvement so perhaps you are referring to something like that.

                J 1 Reply Last reply
                0
                • J jmaida

                  I have been using HASH Tables for many applications. 1. Keyword lookup for command line processing 2. Generic name lookup tables of names, etc. 3. Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time I like HASH tables because they are easy to implement, but the key question is what HASH function does one use. Here is one I use: unsigned int HASH_Value( char *name ) { unsigned long int hashval; int i; hashval = 0; for( i = 0; i < HASH_MAX_NAME_SIZE; i++ ) { if( name[i] == '\0' ) break; hashval += name[i] * i + 1; } return( (unsigned int)(hashval)%HASH_MAX_TABLE_SIZE ); /* traditional hash function for( hashval = 0; *name != '\0'; name++ ) hashval = *name + 31 * hashval; return ( hashval % HASH_MAX_TABLE_SIZE ); */ } It works for me, what works for you? Please ignore any typos. Just looking for discussion on the topic.

                  "A little time, a little trouble, your better day" Badfinger

                  B Offline
                  B Offline
                  BernardIE5317
                  wrote on last edited by
                  #8

                  Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]

                  J J 3 Replies Last reply
                  0
                  • J jschell

                    Use code tags.

                    jmaida wrote:

                    what works for you?

                    Two of your examples use strings as the key. However the first would appear to be a fixed set. You could attempt to optimize based on that set. I have done so in the past to achieve zero collisions. However micro optimizations based on guessing is a waste of time. Optimize based on profiling the application using realistic data. (My example above for zero collisions was in fact a waste of time.) If I was using C or C++ I would use an existing library. Your code example is mixing the hash value with the hash table which works for very limited cases but in general the two should be distinct (thus the library.) Recalculating the hash every single time might not be ideal. But avoiding that means using a more complex structure.

                    jmaida wrote:

                    Substitution for binary tree name lookup that do not require a minimum guaranteed lookup time

                    I do not understand that statement. Hash table and binary tree are distinct data structures. You can replace one with the other but there are considerations for both which your statement does not make clear to me. I do know that I replaced a complex tree (not a normal binary tree) with a hash table and gained about a 30% speed improvement so perhaps you are referring to something like that.

                    J Offline
                    J Offline
                    jmaida
                    wrote on last edited by
                    #9

                    What I meant to say is If one does not required minimum lookup time (it's my understanding though may be wrong, that a balanced binary tree can provide a minimum lookup time), then hashing is an inexpensive alternative.

                    "A little time, a little trouble, your better day" Badfinger

                    1 Reply Last reply
                    0
                    • B BernardIE5317

                      Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]

                      J Offline
                      J Offline
                      jmaida
                      wrote on last edited by
                      #10

                      Had not seen these link, so thanx, Quite interesting.

                      "A little time, a little trouble, your better day" Badfinger

                      1 Reply Last reply
                      0
                      • B BernardIE5317

                        Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]

                        J Offline
                        J Offline
                        jmaida
                        wrote on last edited by
                        #11

                        do not use C++

                        "A little time, a little trouble, your better day" Badfinger

                        1 Reply Last reply
                        0
                        • Greg UtasG Greg Utas

                          After searching for a good hash for strings, I settled on the following:

                          uint32_t string_hash(const char* s)
                          {
                          uint64_t hash = 0;
                          auto size = strlen(s);

                          for(size_t i = 0; i < size; ++i)
                          {
                          hash = s[i] + (hash << 16) + (hash << 6) - hash;
                          }

                          return hash;
                          }

                          And then you truncate the result to be a valid index into your hash table.

                          Robust Services Core | Software Techniques for Lemmings | Articles
                          The fox knows many things, but the hedgehog knows one big thing.

                          J Offline
                          J Offline
                          jmaida
                          wrote on last edited by
                          #12

                          Worked the best using random generated names. Generated 100 hash values with only 2 collisions. not bad. I'll call it the GREG UTAS HASH

                          "A little time, a little trouble, your better day" Badfinger

                          Greg UtasG 1 Reply Last reply
                          0
                          • J jmaida

                            Worked the best using random generated names. Generated 100 hash values with only 2 collisions. not bad. I'll call it the GREG UTAS HASH

                            "A little time, a little trouble, your better day" Badfinger

                            Greg UtasG Offline
                            Greg UtasG Offline
                            Greg Utas
                            wrote on last edited by
                            #13

                            It's not mine! I found it somewhere on the net but don't recall where. EDIT: Sorry for just saying "After searching...". Now I see how it can be misinterpreted.

                            Robust Services Core | Software Techniques for Lemmings | Articles
                            The fox knows many things, but the hedgehog knows one big thing.

                            <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
                            <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

                            J 1 Reply Last reply
                            0
                            • B BernardIE5317

                              Greetings and Kind Regards May I please inquire is there a reason you do not utilize any of these? std::hash - cppreference.com[^]

                              J Offline
                              J Offline
                              jschell
                              wrote on last edited by
                              #14

                              BernardIE5317 wrote:

                              do not utilize any of these?

                              Just noticed the following in the list of what it supports. (There are some others.) I can only hope no one is using those.

                              hash support for std::chrono::weekday
                              hash support for std::chrono::leap_second

                              1 Reply Last reply
                              0
                              • Greg UtasG Greg Utas

                                It's not mine! I found it somewhere on the net but don't recall where. EDIT: Sorry for just saying "After searching...". Now I see how it can be misinterpreted.

                                Robust Services Core | Software Techniques for Lemmings | Articles
                                The fox knows many things, but the hedgehog knows one big thing.

                                J Offline
                                J Offline
                                jmaida
                                wrote on last edited by
                                #15

                                I am giving you credit for funding it. I have 3 variations of hash functions and it's the best so far. One day I will post them, but too much going on here. Our weather here has gone frigid (for us) going into the teens. Trying to protect plants and such.

                                "A little time, a little trouble, your better day" Badfinger

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups