Boundless Binary Search

Igor van den Hoven

I've been playing around with binary search algorithms and have created a novel new variant that appears to be significantly faster than any traditional implementations I've seen.

int boundless_binary_search(int *array, int array_size, int key)
{
register int mid, i;

    mid = i = array\_size - 1;

    while (mid > 7)
    {
            mid = mid / 2;

            if (key < array\[i - mid\]) i -= mid;
    }
    while (i && key < array\[i\]) --i;

    if (key == array\[i\])
    {
            return i;
    }
    return -1;

}

I'm wondering if this is a novel binary search implementation, or has this approach been used by someone before? Some variants and benchmark graphs: https://sites.google.com/site/binarysearchcube/binary-search[^]

Lost User

Perhaps I'm not thinking about it right (I have a cold so I'm not particularly sharp right now), but it seems to me that if the item is in the second half of the array, it will essentially devolve into a linear search.

Igor van den Hoven

It switches to a linear search when roughly 8 elements are left. The i != 0 check is there in case the key is smaller than the value at index 0.

Lost User

Yes ok, I guess the cold got to me. I was thinking something weird based around the assumption that mid started in the middle, which it obviously doesn't. So it works, that's good. It seems closely related to the variant which keeps a "midpoint" and a "span" (here it's the next span (mid), and the midpoint plus that next span (i)). Same pros and cons too (needs fixup at the end, but inner loop is simple), the "midpoint/span" variant is usually seen (when seen at all) in its "more complicated math in the inner loop"-form which doesn't need fixup, but then what's the point.

Igor van den Hoven

Using a midpoint and a span is slower because it requires 2 assignments per loop opposed to 1.5 (on average) in my implementation. I assume it's the fixup and assignment issue that left academics stumped for the past 60 years. There are also caching issues for larger arrays, for which I've created a variant that is mindful in that regard.

Lost User

Gregorius van den Hoven wrote:

requires 2 assignments per loop opposed to 1.5 (on average) in my implementation.

Well you realize it'll be a conditional move, right? But only one, whereas regular binary search would have two (or an unpredictable branch, yuck). So you'd get something like (from GCC output)

.L4:
sar edx
mov ecx, eax
sub ecx, edx
cmp [esi+ecx*4], ebx
cmovg eax, ecx
cmp edx, 7
jg .L4

In a quick test, GCC didn't feel like using cmovs for plain old "left and right bounds" binary search. You can easily measure a huge difference due to that sort of thing, and I'm not sure that's 100% fair, after all you could implement ye olde binary search with cmovs.

Igor van den Hoven

I'm not well versed in that area. Looks like the compiler is doing something differently when you free up a registry space, but it's still slower.

int tailing_binary_search(int *array, int array_size, int key)
{
register int bot, mid, i;

    if (key < array\[0\])
    {
            return -1;
    }

    bot = 0;
    i = array\_size - 1;
    mid = i / 2;

    while (bot + 7 < i)
    {
            if (key < array\[mid\])
            {
                    i = mid - 1;
                    mid -= (i - bot) / 2;
            }
            else
            {
                    bot = mid;
                    mid += (i - bot) / 2;
            }
    }
    while (key < array\[i\]) --i;

    if (key == array\[i\])
    {
            return i;
    }
    return -1;

}

Lost User

Yea GCC at least makes some branches there in the loop, and it's using more complicated divisions by 2 (that work when negative).

Alan Balkany

This approach has been done before, e.g. the implementation of qsort in the C runtime library transitions from quicksort to a (linear) insertion sort when the partition size gets below a certain threshold.

alexpeter

The code given here and within the analysis is flawed, not that much in the sense that it is not going to work, but in the sense that it follows the idea explained in a wrong way. If the idea was to reach and analyze last about 6 elements, that is not what this code is doing. The bug is subtle. For example this code, which executes if we have an element we want somewhere within first few elements

    int mid, i;
    
    mid = i = 2147483647-1;

    while (mid > 7)
    {
            mid = mid / 2;

            i -= mid;
    }
    
    printf("%d\\n",i);

is giving 34, which means that we will have 34 elements(!) to parse more if the element we are looking for is somewhere at the beginning. This is not even near 7. Only if this code is corrected and tested, we can really compare the idea. To have an algorithm that occasionally parse through 34 elements is really not a binary search even if this is happening here and there. So a very BIG caution about implementing any new idea regarding binary search. Yes, there could be the way and this idea above could be something useful but that must be verified. It seems from this perspective that the flaw of finding first few elements significantly slower is what made this algorithm faster on average. In any case the code above cannot be taken as correct. The gymnastic of pointer and size (here incorrectly named mid) is much more complicated. Known solution with pointers and size with deferred detection looks more like this

int binary_search3s(int a[], int n, int t)
{
int p=0;

while(n>1)
{
    if(a\[p + n/2\]<=t)
    {
        p+=n/2;
        n++;
    }

    n=n/2;

}

if(a\[p\]==t) return p;

return -1;

}

This should be a starting point if anything is to be improved. (3s in the function name is from a family of solutions.)

Aleksandar