'diff' algorithm
-
I need to find differences between two large chunks of memory where the differences themselves are fairly small. I have downloaded GNU diff sources (Eugene Meyers?) and need to compare performance on my data with other diff-algorithms, like McIlroy-Hunt. I have only basic knowleadge on the topic, so any webresources or book recommendations would be greatly appreciated. Thanks /moliate
The corners of my eyes catch hasty, bloodless motion - a mouse? Well, certainly a peripheral of some kind.
Neil Gaiman - Cold Colours
-
I need to find differences between two large chunks of memory where the differences themselves are fairly small. I have downloaded GNU diff sources (Eugene Meyers?) and need to compare performance on my data with other diff-algorithms, like McIlroy-Hunt. I have only basic knowleadge on the topic, so any webresources or book recommendations would be greatly appreciated. Thanks /moliate
The corners of my eyes catch hasty, bloodless motion - a mouse? Well, certainly a peripheral of some kind.
Neil Gaiman - Cold Colours
-
This may be overkill for your needs, but check out: http://sourceforge.net/projects/xdelta/[^] Shog9 ------
That why you have a dual processor system. One for system, one for the screen saver - Mark Nischalke on Win2k server administration
-
I need to find differences between two large chunks of memory where the differences themselves are fairly small. I have downloaded GNU diff sources (Eugene Meyers?) and need to compare performance on my data with other diff-algorithms, like McIlroy-Hunt. I have only basic knowleadge on the topic, so any webresources or book recommendations would be greatly appreciated. Thanks /moliate
The corners of my eyes catch hasty, bloodless motion - a mouse? Well, certainly a peripheral of some kind.
Neil Gaiman - Cold Colours
nice discussion here[^] The problem with a big chunk of memory is the pure size of the string. Most diff-like programs use something other than a byte or character as the basic string element. (e.g. a line of text is probably the most common string element for the diff algorithm. hence the 'string' is really only as long as the number of lines in the file.) If you can give up the minimum # of byte changes in the two memory blocks, you can get better performance by using some suitable 'chunk' definition to break up memory into the elements of the strings to be passed for diff analysis. I'm no expert on this but was implementing something similar just recently where the diff was first calculated on our chunks, and only if the changed regions were small, would a finer intra-chunk diff be performed. Be careful though, your chunk definition should have the property that changes naturally fall within chunks, not across chunk boundaries.
-
nice discussion here[^] The problem with a big chunk of memory is the pure size of the string. Most diff-like programs use something other than a byte or character as the basic string element. (e.g. a line of text is probably the most common string element for the diff algorithm. hence the 'string' is really only as long as the number of lines in the file.) If you can give up the minimum # of byte changes in the two memory blocks, you can get better performance by using some suitable 'chunk' definition to break up memory into the elements of the strings to be passed for diff analysis. I'm no expert on this but was implementing something similar just recently where the diff was first calculated on our chunks, and only if the changed regions were small, would a finer intra-chunk diff be performed. Be careful though, your chunk definition should have the property that changes naturally fall within chunks, not across chunk boundaries.
Thanks. My problem is mostly that the string cannot be divided into natural chunks, meaning that the insertion of a single byte could impose a lot of extra processing. I'll look more closely at the PERL code for the LCS. It seems to have some memory improvements over full table LCS, which is very nice when working with strings of 100kb each. Cheers /moliate
The corners of my eyes catch hasty, bloodless motion - a mouse? Well, certainly a peripheral of some kind.
Neil Gaiman - Cold Colours