Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Similarity search in SQL

Similarity search in SQL

Scheduled Pinned Locked Moved Database
questiondatabasetutorial
7 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jassim Rahma
    wrote on last edited by
    #1

    I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

    T I L N L 6 Replies Last reply
    0
    • J Jassim Rahma

      I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

      T Offline
      T Offline
      T2102
      wrote on last edited by
      #2

      This is not a perfect solution, but hopefully it might help. If you separate out first, middle, and last name, then you might be able to look at sum of two instr(.) calls being greater than 0 when you search for the first few and last few characters. Given the names that match based on first and last name, then eliminate only based on middle as it might be left out. You also might call a user defined function to count the # of matching characters (properly ordered) in each name on your smaller result set.

      1 Reply Last reply
      0
      • J Jassim Rahma

        I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

        I Offline
        I Offline
        i j russell
        wrote on last edited by
        #3

        SOUNDEX is the standard way of handling phonetic searches in Sql Server but it can return lots of false matches.

        1 Reply Last reply
        0
        • J Jassim Rahma

          I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #4

          Hi, IMO you can solve such problems by using a "canonical form"; for each possible value, the canonical value should be defined somehow, in your example it could be the alphabetically first of all equivalent names, so in your Jassim example it would be Jasem. So you store the original value (Jassim) and the canonical value (Jasem); when a search is launched for say Jasim, you look up or compute its canonical value (again Jasem) and launch a search for that canonical value. Again in your example, the canonical form would probably be defined by a database table, which also means you can store its unique ID rather than its string value. :)

          Luc Pattyn [Forum Guidelines] [My Articles]


          I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


          1 Reply Last reply
          0
          • J Jassim Rahma

            I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

            N Offline
            N Offline
            Niladri_Biswas
            wrote on last edited by
            #5

            Hi, I agree with Mr. Russel that though soundex will help in such situations but it will cause problems on the other hand by delivering extra results. I suggest better use Metaphone(which was first described by Lawrence Philips in the December 1990 edition of Computer Language magazine) for serving your purpose Look here for SQL Version Metaphone (Simple and Double) and here for Oracle Version(You will find the code in Listing A) Consider algorithms other than Soundex Hope this helps :)

            Niladri Biswas

            1 Reply Last reply
            0
            • J Jassim Rahma

              I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              SOUNDEX function is the simplest way to start. But it matches only based on phonetic similarities. For example, 'Mohammed' and 'Mohamed' will match but not 'Mohd', since it has a completely different pronounciation. You need to write your own function to take care of such cases, probably by storing canonical forms of common names and comparing it with the input.

              1 Reply Last reply
              0
              • J Jassim Rahma

                I am sure I have to explain this with more details.. People names in Arabic have more than one way to spell it.. for example, Hassan can be spelled with double or single S & Mohammad can be spelled (Mohammad) or (Mohd) or (Moh’d) or (Mohammed), etc… & my name can be spelled the way I spell it (Jassim) or (Jasim) or (Jassem) or (Jasem).. can you believe that?!!! Now, what if I have the following name in my customers table in the full_name record JASSIM MOHD HASSAN And the user entered the following name: JASIM MOHD HASAN Or: JASSIM MOHAMMAD HASSAN Or any other different spelling way.. How can I search for similar names in the full_name records and alert the user for possible existing names?

                D Offline
                D Offline
                David Skelly
                wrote on last edited by
                #7

                As noted by other posters, soundex is the easiest solution although it may not always work quite as expected. A better "fuzzy matching" algorithm is the Levenstein algorithm which is pretty good, but you would probably have to code up your own solution for that and it might be a challenge to get it to perform over a large database. (Plenty of articles available if you Google for it.) Or, you could use a full text search engine like Lucene or MS Search Server which might be overkill for you but which do have some nice built-in fuzzy search support. (They are intended for searching documents but you can get adaptors to run them over relational databases as well.)

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups