Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Algorithms
  4. Super Fast edit-distance algorithm: Sift2 [modified]

Super Fast edit-distance algorithm: Sift2 [modified]

Scheduled Pinned Locked Moved Algorithms
algorithmshtmlcomperformancehelp
2 Posts 2 Posters 21 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Siderite Zaqwedex
    wrote on last edited by
    #1

    http://siderite.blogspot.com/2007/01/super-fast-string-distance-algorithm.html[^] I hope I am not reinventing an old and rusty wheel, even if I am sure someone somewhere had the same idea as I did, however, I didn't find it on Google, so I am presenting this algorithm I've created to compare two strings, as a fast replacement for Levenstein. I think it's pretty accurate for its speed. I have tested it on 60000 company detail records and the average dispersion from levenshtein is 3%, with 15% max, and the greatest errors when strings were fairly different in length. (when strings are equal in length, the average error is 1%) So, I submit this is a good algorithm to see if strings are similar or fairly different, leaving it to levenshtein to compute exact distance values when sift2 values go above a certain threshold. The complexity is almost linear O(n*constant). What do you guys think? -- modified at 9:46 Wednesday 10th January, 2007 -- modified at 5:05 Tuesday 16th January, 2007 Made it slightly faster and a lot more precise.

    ---------- Siderite

    J 1 Reply Last reply
    0
    • S Siderite Zaqwedex

      http://siderite.blogspot.com/2007/01/super-fast-string-distance-algorithm.html[^] I hope I am not reinventing an old and rusty wheel, even if I am sure someone somewhere had the same idea as I did, however, I didn't find it on Google, so I am presenting this algorithm I've created to compare two strings, as a fast replacement for Levenstein. I think it's pretty accurate for its speed. I have tested it on 60000 company detail records and the average dispersion from levenshtein is 3%, with 15% max, and the greatest errors when strings were fairly different in length. (when strings are equal in length, the average error is 1%) So, I submit this is a good algorithm to see if strings are similar or fairly different, leaving it to levenshtein to compute exact distance values when sift2 values go above a certain threshold. The complexity is almost linear O(n*constant). What do you guys think? -- modified at 9:46 Wednesday 10th January, 2007 -- modified at 5:05 Tuesday 16th January, 2007 Made it slightly faster and a lot more precise.

      ---------- Siderite

      J Offline
      J Offline
      Jeremy Falcon
      wrote on last edited by
      #2

      Siderite Zaqwedex wrote:

      What do you guys think?

      It's an interesting read. Seems like you should write an article on it.

      Jeremy Falcon "It's a good thing to do and a tasty way to do it." - Wilford Brimley[^]

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups