Super Fast edit-distance algorithm: Sift2 [modified]
-
http://siderite.blogspot.com/2007/01/super-fast-string-distance-algorithm.html[^] I hope I am not reinventing an old and rusty wheel, even if I am sure someone somewhere had the same idea as I did, however, I didn't find it on Google, so I am presenting this algorithm I've created to compare two strings, as a fast replacement for Levenstein. I think it's pretty accurate for its speed. I have tested it on 60000 company detail records and the average dispersion from levenshtein is 3%, with 15% max, and the greatest errors when strings were fairly different in length. (when strings are equal in length, the average error is 1%) So, I submit this is a good algorithm to see if strings are similar or fairly different, leaving it to levenshtein to compute exact distance values when sift2 values go above a certain threshold. The complexity is almost linear O(n*constant). What do you guys think? -- modified at 9:46 Wednesday 10th January, 2007 -- modified at 5:05 Tuesday 16th January, 2007 Made it slightly faster and a lot more precise.
---------- Siderite
-
http://siderite.blogspot.com/2007/01/super-fast-string-distance-algorithm.html[^] I hope I am not reinventing an old and rusty wheel, even if I am sure someone somewhere had the same idea as I did, however, I didn't find it on Google, so I am presenting this algorithm I've created to compare two strings, as a fast replacement for Levenstein. I think it's pretty accurate for its speed. I have tested it on 60000 company detail records and the average dispersion from levenshtein is 3%, with 15% max, and the greatest errors when strings were fairly different in length. (when strings are equal in length, the average error is 1%) So, I submit this is a good algorithm to see if strings are similar or fairly different, leaving it to levenshtein to compute exact distance values when sift2 values go above a certain threshold. The complexity is almost linear O(n*constant). What do you guys think? -- modified at 9:46 Wednesday 10th January, 2007 -- modified at 5:05 Tuesday 16th January, 2007 Made it slightly faster and a lot more precise.
---------- Siderite
Siderite Zaqwedex wrote:
What do you guys think?
It's an interesting read. Seems like you should write an article on it.
Jeremy Falcon "It's a good thing to do and a tasty way to do it." - Wilford Brimley[^]