Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. hashing algorithms

hashing algorithms

Scheduled Pinned Locked Moved The Lounge
questioncomalgorithmscryptography
27 Posts 18 Posters 5 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P peterchen

    one use that has not been mentioned is speeding up lookup e.g. in an associative map. Imagine you have a dictionary [key, value] with quite a many long words as keys. Instead of using string comparisons, you just compare the hashes. The hashes of the strings in the dicitonary can be precalculated, and the dicitonary indexed by the hash instead of the string. of course there's a slight chance of two keys have the same hash - this must be treated separately, so you end up with a dictionary [key-hash, [ vector([key,value]) ] ] However, with a well-chosen hash this can be much faster.


    "Vierteile den, der sie Hure schimpft mit einem türkischen Säbel."
    sighist | Agile Programming | doxygen

    T Offline
    T Offline
    Terry Denham
    wrote on last edited by
    #21

    This is exactly what I did with an ADO Recordset class that was generated using the #import directive. One of my peers was needing to pull back a fairly large set of data but do to some requirements we weren't able to build the data with a set of joins so we had to have this data represented in the recordset. Some of the records would be linked to other records in the same set. The process was taking about 14 hours to process about 140000 records due to the large number of loops that it had. I had them remove one of the loop and wrote a CAdoRecordsetIndex class that would be an associative map on the records in the CAdoRecordset class based on what what columns you told it to build the index on. Then when you needed to find the values you would pass in the array of values that you wanted to search for, the index class would turn this into a key, find the bookmark to the record that had this key, I used the Vector as the value incase there were multiple records that had the same nonunique key. Just this change alone took the processing from 14 Hours to 15 Minutes just by using the associative hash. This could have been improved some more if I would have changed the hash bucket size. I used the default of 17 (which is usually a prime number). If I would have used a larger prime I would have reduced the amount of time looking in the vector.

    1 Reply Last reply
    0
    • P peterchen

      one use that has not been mentioned is speeding up lookup e.g. in an associative map. Imagine you have a dictionary [key, value] with quite a many long words as keys. Instead of using string comparisons, you just compare the hashes. The hashes of the strings in the dicitonary can be precalculated, and the dicitonary indexed by the hash instead of the string. of course there's a slight chance of two keys have the same hash - this must be treated separately, so you end up with a dictionary [key-hash, [ vector([key,value]) ] ] However, with a well-chosen hash this can be much faster.


      "Vierteile den, der sie Hure schimpft mit einem türkischen Säbel."
      sighist | Agile Programming | doxygen

      C Offline
      C Offline
      ColinDavies
      wrote on last edited by
      #22

      peterchen wrote: of course there's a slight chance of two keys have the same hash - I remmeber writing a routine to see if this would happen once with some data. Probability theory says that it is possible, but with my data it never occured. Just by increasing the Hash a few bytes the probability drops of astronomically. Regardz Colin J Davies

      *** WARNING *
      This could be addictive
      **The minion's version of "Catch :bob: "

      It's a real shame that people as stupid as you can work out how to use a computer. said by Christian Graus in the Soapbox

      1 Reply Last reply
      0
      • P peterchen

        one use that has not been mentioned is speeding up lookup e.g. in an associative map. Imagine you have a dictionary [key, value] with quite a many long words as keys. Instead of using string comparisons, you just compare the hashes. The hashes of the strings in the dicitonary can be precalculated, and the dicitonary indexed by the hash instead of the string. of course there's a slight chance of two keys have the same hash - this must be treated separately, so you end up with a dictionary [key-hash, [ vector([key,value]) ] ] However, with a well-chosen hash this can be much faster.


        "Vierteile den, der sie Hure schimpft mit einem türkischen Säbel."
        sighist | Agile Programming | doxygen

        L Offline
        L Offline
        leppie
        wrote on last edited by
        #23

        peterchen wrote: However, with a well-chosen hash this can be much faster. Dont you mean longer? :laugh: leppie::AllocCPArticle("Zee blog");
        Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

        1 Reply Last reply
        0
        • J James Simpson

          Can someone please elighten me as to how these are possible? Is a 'hash' a peice of data that can be used to validate data eg: Data -> Hash Algorithm -> Hash value the same data creates the same hash value but you can not recreate the data from the same algorithm? In my ignorance I must ask, what is the point of the hash value? surely the data represents itself - and the hash value could not uniquely identify the item of data? If this is not suitable for the lounge - I apologise, I will remain confused! James Simpson Web Developer imebgo@hotmail.com P S - This is what part of the alphabet would look like if Q and R were eliminated
          Mitch Hedberg

          J Offline
          J Offline
          Jorgen Sigvardsson
          wrote on last edited by
          #24

          You need this book[^] :) Buy it and read it. You'll love it :) -- Here I am now, I'm your saviour. You can see I'm the one!

          J 1 Reply Last reply
          0
          • J James Simpson

            Questions questions! Message digest algorithm. (RFC 1320). The message digest algorithm takes as input a message of arbitrary length and produces as output a "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given pre-specified target message digest. The above paragraph to me indicates that the message digest is a fixed length value of a peice of data (which is created from a variable length peice of data). How can a fixed length peice of data represent ANY input peice of data? it makes no sense - or am I being totally stupid? Maybe if you knew the length of the input peice of data with the message digest then maybe it would work! Im confused James Simpson Web Developer imebgo@hotmail.com P S - This is what part of the alphabet would look like if Q and R were eliminated
            Mitch Hedberg

            G Offline
            G Offline
            Gary R Wheeler
            wrote on last edited by
            #25

            The point of the hash is not to be able to reproduce the original data from the hash. The point is to produce a smaller value that can be used to validate a similar or equivalent message at a later time. At one point in time, suppose you have a message A1, which has hash value H1. At some later time, suppose you receive message A2. For message A2 you calculate hash H2. If H2 == H1, then you can conclude that messages A1 and A2 are at least similar (if not equivalent), depending upon the hash algorithm.


            Software Zen: delete this;

            1 Reply Last reply
            0
            • P Paul Oss

              Daniel Turini wrote: When you need to search for "test", you only compute its hash value (5) and look at a[5], without need to search through all the array. Assuming that your hash algorithm is truly generating unique #'s for each unique string. Ie, if your algorithm takes two completely different strings, and due to weaknesses in your math, return the same hash value, you're screwed. ;) Paul

              D Offline
              D Offline
              Daniel Turini
              wrote on last edited by
              #26

              I explained how to deal with collisions (a bit simplistic, for didactic reasons) on the next post Trying to make bits uncopyable is like trying to make water not wet. -- Bruce Schneier

              1 Reply Last reply
              0
              • J Jorgen Sigvardsson

                You need this book[^] :) Buy it and read it. You'll love it :) -- Here I am now, I'm your saviour. You can see I'm the one!

                J Offline
                J Offline
                James Simpson
                wrote on last edited by
                #27

                I have this book - bought it a couple of years ago. And yes - I did explain things :) James Simpson Web Developer imebgo@hotmail.com P S - This is what part of the alphabet would look like if Q and R were eliminated
                Mitch Hedberg

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups