Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. Comparing "Similar" Strings

Comparing "Similar" Strings

Scheduled Pinned Locked Moved Visual Basic
databasequestionhtmlsql-servercom
4 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    Todd Davis
    wrote on last edited by
    #1

    I know this is a toughie, but I figured I'd ask... I am writing an internal application for my company. We have thousands of Articles on our web server in HTML (actually ASP) format. These articles are technical in nature, and support the various software programs we write. This tool is meant to scan the text of those articles, and convert the information into a new format that we'll be storing in a database. Part of that conversion process will include identifying the product and version that each article is associated with. This is really easy when the product name is spelled correctly and formatted the same as what I'd expect it to be. Unfortunately, people make mistakes (lots and lots of mistakes) and what I expect is rarely what is there. For a silly example, let's say that I'm scanning the text for the word "Microsoft" to see if this article is associated with a Microsoft product. Easy enough, right? But as I start looking at articles, I see: Micro Soft Microssoft Microsoff MS Microsucks MSoft .. .. Well, you get the idea. Doing a simple InStr(Article, "Microsoft") doesn't always find what I want. What I need is a more "fuzzy" compare method. Microsoft SQL Server has a function called LIKE that is kind of close to this - better yet, I can use the FreeText or Contains methods to provide a fuzzy search. Is there a similar function or technique in VB that would allow me to do a compare like this? -Todd Davis (toddhd@hotmail.com)

    D P 2 Replies Last reply
    0
    • T Todd Davis

      I know this is a toughie, but I figured I'd ask... I am writing an internal application for my company. We have thousands of Articles on our web server in HTML (actually ASP) format. These articles are technical in nature, and support the various software programs we write. This tool is meant to scan the text of those articles, and convert the information into a new format that we'll be storing in a database. Part of that conversion process will include identifying the product and version that each article is associated with. This is really easy when the product name is spelled correctly and formatted the same as what I'd expect it to be. Unfortunately, people make mistakes (lots and lots of mistakes) and what I expect is rarely what is there. For a silly example, let's say that I'm scanning the text for the word "Microsoft" to see if this article is associated with a Microsoft product. Easy enough, right? But as I start looking at articles, I see: Micro Soft Microssoft Microsoff MS Microsucks MSoft .. .. Well, you get the idea. Doing a simple InStr(Article, "Microsoft") doesn't always find what I want. What I need is a more "fuzzy" compare method. Microsoft SQL Server has a function called LIKE that is kind of close to this - better yet, I can use the FreeText or Contains methods to provide a fuzzy search. Is there a similar function or technique in VB that would allow me to do a compare like this? -Todd Davis (toddhd@hotmail.com)

      D Offline
      D Offline
      Dave Kreskowiak
      wrote on last edited by
      #2

      Not unless you write it. I found a couple of resources on the 'Net about the subject just by searching for 'fuzzy string compare'. You might want to try converting this[^] Delphi source. You also might want to try working something up using Regular Expressions. I don't have any code, but it's an idea I would look into. RageInTheMachine9532

      1 Reply Last reply
      0
      • T Todd Davis

        I know this is a toughie, but I figured I'd ask... I am writing an internal application for my company. We have thousands of Articles on our web server in HTML (actually ASP) format. These articles are technical in nature, and support the various software programs we write. This tool is meant to scan the text of those articles, and convert the information into a new format that we'll be storing in a database. Part of that conversion process will include identifying the product and version that each article is associated with. This is really easy when the product name is spelled correctly and formatted the same as what I'd expect it to be. Unfortunately, people make mistakes (lots and lots of mistakes) and what I expect is rarely what is there. For a silly example, let's say that I'm scanning the text for the word "Microsoft" to see if this article is associated with a Microsoft product. Easy enough, right? But as I start looking at articles, I see: Micro Soft Microssoft Microsoff MS Microsucks MSoft .. .. Well, you get the idea. Doing a simple InStr(Article, "Microsoft") doesn't always find what I want. What I need is a more "fuzzy" compare method. Microsoft SQL Server has a function called LIKE that is kind of close to this - better yet, I can use the FreeText or Contains methods to provide a fuzzy search. Is there a similar function or technique in VB that would allow me to do a compare like this? -Todd Davis (toddhd@hotmail.com)

        P Offline
        P Offline
        PaleyX
        wrote on last edited by
        #3

        There is a good article on this website about it http://www.codeproject.com/string/dmetaphone6.asp[^] Rugby League: The Greatest Game Of All.

        R 1 Reply Last reply
        0
        • P PaleyX

          There is a good article on this website about it http://www.codeproject.com/string/dmetaphone6.asp[^] Rugby League: The Greatest Game Of All.

          R Offline
          R Offline
          RichardGrimmer
          wrote on last edited by
          #4

          Try doing a SOUNDEX search - when people spell incorrectly, the word usually sounds the same - SOUNDEX creates a numeric value for a string based in its phonetics, so Smithe and Smythe would result in the same SOUNDEX value - believe SQL Server (and definately Oracle - ner!) support it straight out of the box... "Now I guess I'll sit back and watch people misinterpret what I just said......" Christian Graus At The Soapbox

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups