Good way to detect a useful portion of equivalent expressions

_Erik_

This is becoming really interesting. Have my 5. Yes, I agree as well. That's why I stated it was a heuristic algorithm, not a demonstration, so it is theoretically very fallible, of course. But now, take it to practice, increasing the number of random checks with their succesors becouse you know it might fail. Do you think it would not be reliable in practice?

Luc Pattyn

_Erik_ wrote:

Do you think it would not be reliable in practice?

that depends a lot on the kind of expressions, operators, functions, ... you envision. you will have a hard time to compare (i==48329134)?1:0 and (i==48329135)?1:0 in a heuristic way. :)

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

_Erik_

Luc Pattyn wrote:

that depends a lot on the kind of expressions, operators, functions, ... you envision.

Sure, I was giving an idea that might be worth under some previous conditions, and maybe some expressioins, operators, functions... would be hard to compare this way. However...

Luc Pattyn wrote:

you will have a hard time to compare (i==48329134)?1:0 and (i==48329135)?1:0 in a heuristic way.

Not that hard in this case, since these expressions can be defined as: f1(n) = 1 if n==48329134; 0 in any other case f2(n) = 1 if n==48329135; 0 in any other case We have two base cases here for each one of the expressions: n=0 and n=48329134 in the first case; n=0 and n=48329135 on the second one, so the method I have described before would work perfectly and really quick finding the two expressions are not equivalent becouse one of the base cases do not match.

Matty22

Eric's suggestion is excellent and is quite similar to how the fast primality test finds large primes quickly. If you test in enough bases the chance of an equation failing can be reduced to 1 in billions This means it falls under the meteor rule.. If the chance of a false positive is less than being hit by a meteor today..It's okay to use it So I think Eric is got an excellent idea.

David1987

I respectfully disagree. This will be part of an optimizing compiler. If the meteor decides to hit us after all then someone will have a nearly impossible to diagnose bug. There are also large classes of expressions for which Eriks heuristic method does not work at all regardless of how many iterations it uses or require such a large number of iterations that it becomes impractical. For example: (uint64)x / 3 and ((uint64)x * 0x5555556) >> 32 are the same for x in the interval 0..0x7FFFFFFF but not outside it. That gives a 0.5^k error probability where k is the number of iterations - which is fairly good, but if you combine a couple of those you can make the error probability arbitrarily high. And I'm not even "cheating" with comparisons here - a single comparison against a constant can make the error probability arbitrarily high. They could be special-cased of course, but can the expressions above be special cased?

mbyamukama

David, if these expressions are analytic expressions (defined and differentiable for all values of input), then a simple proof by induction will ALWAYS verify their equivalence. So you only have to try for a few values.

BHM

David1987

Maybe so, but that's not what we have here, they are expressions with types such as int32 (see original post) and the class of expressions is not limited to something simple. For example, 1/x and 2/x differ only for x=1 and x=2 (edit: and -1 and -2), but there are 2^32 possibilities for x if it is an int32 and even more if it is an int64 - the chance that either 1 or 2 is tested isn't that big.

modified on Sunday, April 3, 2011 8:58 AM

Alex Shtof

Can you define or characterize this "good portion"? For example, if you want to detect equivalence between bounded-degree polynomials (two computer programs or expressions such that their result is eventually a polynomial in the input), and you know the degree bound, it can be easily done by checking a small and finite number of inputs.

David1987

A good portion is hard to define :) It should find "significantly" more equivalences than merely comparing whether two expressions are equal. The class of expressions is not constrained in any way. I suppose I could first detect whether two expressions are both polynomials and then use a special comparison, but there are plenty of other types of expression so in order to find a "good portion" of equivalences I'd need something for them too.

Alex Shtof

Well, I suppose that you know that for polynomials of degree n, you need to check for equality at (n+1) points. If all are equal, the polynomials are the same. Anyway, detecting polynomials is NOT easy. They can be computed in many ways, with for loops, while loops and other kinds of computations. It's not always visible by analyzing an expression that it is a polynomial. Alex.

David1987

I could detect Some polynomials and match them that way I suppose. That may be more effort than it's worth.. well I guess I'll try it and see how much it helps.

dasblinkenlight

This is an excellent question - the bad news is, the answer is rather complicated. You would need to implement a unification algorithm[^] to do it correctly. Things would not be as bad if you considered Prolog[^] as the language for your implementation, but something tells me that that would be out of the question ;) On the other hand, you can always embed Prolog to get unification for free. But no matter how you look at it, the learning curve is going to be steep. Good luck!

David1987

That page only seems to discuss logic formula's (unless I'm missing something?), does it also work for other formula's?

dasblinkenlight

Absolutely. Formally, you can represent A+B as add(A,B), and define an additional rule that add(A,B) :== add(B,A). In fact, since you are interested only in a yes/no answer (as opposed to variable assignments produced by unification), you don't even need to implement a complete unification: a simpler recursive matching algorithm would do. The only challenge in its implementation is matching ((A+B)+C) trees to (A+(B+C)). You can deal with it by flattening the list of operands, and trying all possible orderings on one of the sides. Don't forget to memoize[^] your matches, both positive and negative. Otherwise, your search will be too slow.