Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Complex String Comparison

Complex String Comparison

Scheduled Pinned Locked Moved C / C++ / MFC
question
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    James Spibey
    wrote on last edited by
    #1

    What I need to do is compare a series of strings (actually filenames) and find potential duplicates. The strings, however, are unlikely to contain literal duplicates so what I need to do is find similar strings. Has anyone ever had experience of this before? Cheers James

    D 1 Reply Last reply
    0
    • J James Spibey

      What I need to do is compare a series of strings (actually filenames) and find potential duplicates. The strings, however, are unlikely to contain literal duplicates so what I need to do is find similar strings. Has anyone ever had experience of this before? Cheers James

      D Offline
      D Offline
      Daniel Turini
      wrote on last edited by
      #2

      Yes, this is a common need in credit applications (find names mistyped). In your case, I suggest the Levenshtein String (or Edit) Distance algorithm for this. You can find tons of implementations in C++ on google. lazy isn't my middle name.. its my first.. people just keep calling me Mel cause that's what they put on my drivers license. - Mel Feik

      J 1 Reply Last reply
      0
      • D Daniel Turini

        Yes, this is a common need in credit applications (find names mistyped). In your case, I suggest the Levenshtein String (or Edit) Distance algorithm for this. You can find tons of implementations in C++ on google. lazy isn't my middle name.. its my first.. people just keep calling me Mel cause that's what they put on my drivers license. - Mel Feik

        J Offline
        J Offline
        James Spibey
        wrote on last edited by
        #3

        Thanks for that Daniel. I took a look at it and that solves part of my problem. The other part of the problem is that I need to find duplicates based on whether a string has some elements of another.For example, I would want to match the following filenames c:\Music\Albums\Vines\01 - Highly Evolved.mp3 c:\Music\Singles\The Vines - Highly Evolved.mp3 f:\Stuff\Vines - Highly Evolved.wma Do you know if there is a standard way to do this? I don't think there is but I just wanted to make sure. Cheers James

        J 1 Reply Last reply
        0
        • J James Spibey

          Thanks for that Daniel. I took a look at it and that solves part of my problem. The other part of the problem is that I need to find duplicates based on whether a string has some elements of another.For example, I would want to match the following filenames c:\Music\Albums\Vines\01 - Highly Evolved.mp3 c:\Music\Singles\The Vines - Highly Evolved.mp3 f:\Stuff\Vines - Highly Evolved.wma Do you know if there is a standard way to do this? I don't think there is but I just wanted to make sure. Cheers James

          J Offline
          J Offline
          jhwurmbach
          wrote on last edited by
          #4

          If you are really into informatics as a science, maybe some algorithms from bioinformatics can help you. Finding substring in one another and computing similarity-distance is quite common in DNA-sequence handling. You will find more stuff about that in the net than you are able to read in the entire rest of your life.

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups