Good way to detect a useful portion of equivalent expressions

David1987

So then I'd have to convert the expressions first, ok, it's still looking quite weird though because surely "expressions are equivalent for 3 sampled inputs" does not imply "expressions are equivalent for all inputs"?

_Erik_

Errr, of course not. I am sorry if I am not explaining it well... We do not check if they are equivalent for any three random values, we check them for the base case, a random case and the succesor of the random case. It is just a straight application of the axiom of induction[^] for natural numbers (remember, works only with natural numbers).

David1987

Well then, what if the expressions are both polynomials that both happen to intersect the x-axis at 3 points, namely the base case, the random case, and its successor?

_Erik_

Induction can only be applied with Natural numbers. If your expression will only evaluate Natural numbers then this is a good, quick and effective approach.

David1987 wrote:

Well then, what if the expressions are both polynomials that both happen to intersect the x-axis at 3 points, namely the base case, the random case, and its successor?

This means Real numbers, so you cannot apply induction here. That's why I said in my first reply: "if the inputs are always natural numbers...". If you don't know what mathematical induction is then I recommend you to follow the link I gave you before but, please, stop replying as if I was stupid and did not know what I am talking about... I can't belive it...

molesworth

_Erik_ wrote:

It is just a straight application of the axiom of induction[^] for natural numbers

But from the wiki article :-

...proving that if any one statement in the infinite sequence of statements is true, then so is the next one.

What you've done is shown it may be true for two arbitrary sequential cases, but you haven't derived a proof that if case n is true, then so must n+1 be. That really requires symbolic manipulation of the equations.

Days spent at sea are not deducted from one's alloted span - Phoenician proverb

_Erik_

Yes, I agree, as I said before, this is just a heuristic method, not a demonstration. Increasing the number of random cases (and succesors) would bring a very accurate result without having to check every possible input. Edit: Have my 5 for your comment

modified on Friday, March 18, 2011 10:53 AM

David1987

_Erik_ wrote:

This means Real numbers

Why does it mean that? Why can't two polynomials intersect at 3 places that are all natural integers? Also I know perfectly well what induction is and this isn't it. You have to prove that you can go from one case to the next - you're just proving that some random case and the next are both accidentally correct. There is no random in induction. You have to analytically show that if something is true for n, that implies that it is also true for n+1.

David1987 · modified on Friday, March 18, 2011 10:53 AM

Ok, maybe I now understand what you meant. However, it doesn't solve the problem, since false positives are not allowed at all.

_Erik_

Definitely, I cannot make myself clear today... English is not my first language... That must be the reason.

David1987 wrote:

Why does it mean that? Why can't two polynomials intersect at 3 places that are all natural integers?

It does not matter where the function intersect with an axis. That is not important. The important is: only natural numbers are valid inputs for the function... I'll try to use other words: the function (or expression) can only be applied to natural numbers. However, the result can be any real number. Speaking about polynomials and "X axis" I understand you are sepaking about drawing a continous function (polynomial), so its inputs can be real numbers. I don't mind at all where de function will intersect the axis. If it can receive any real number as input, then induction is out of scope.

David1987 wrote:

Also I know perfectly well what induction is and this isn't it. You have to prove that you can go from one case to the next - you're just proving that some random case and the next are both accidentally correct.

There is no random in induction. You have to analytically show that if something is true for n, that implies that it is also true for n+1.

This is the third time I will say that this is only a heuristic method applying the axiom of induction, not a demonstration. If you increase your random samples (and succesor) to, let's say, 10, for example, and all of them macth along with the base case, then you can be pretty sure both expressions are equivalent. I think it is much faster than checking for 2^64 different possible inputs, and was trying to put you on a good direction (remember, only for natural numbers as input).

_Erik_

Yes, I know... but I did it when I was at the University and nobody was able to find two expressions that could bring a false positive. It checked for the base case (of course) and another 10 random values (and their respective succesors).

Luc Pattyn

I'm with David on this. Induction means that for some mathematical statement that depends on an integer i: 1. you verify it holds true for a specific (typically small) value j0 of i; 2. you proof by reasoning, not by evaluation or observation, that assuming it holds true for i=j it then will necessarily also be true for i=j+1. 3. you now simply apply step 2 for consecutive values j0, j0+1, j0+2, etc. In what you propose there is no proof by reasoning in step 2, it is just evaluations, hence very fallible. :)

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

_Erik_

This is becoming really interesting. Have my 5. Yes, I agree as well. That's why I stated it was a heuristic algorithm, not a demonstration, so it is theoretically very fallible, of course. But now, take it to practice, increasing the number of random checks with their succesors becouse you know it might fail. Do you think it would not be reliable in practice?

Luc Pattyn

_Erik_ wrote:

Do you think it would not be reliable in practice?

that depends a lot on the kind of expressions, operators, functions, ... you envision. you will have a hard time to compare (i==48329134)?1:0 and (i==48329135)?1:0 in a heuristic way. :)

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

_Erik_

Luc Pattyn wrote:

that depends a lot on the kind of expressions, operators, functions, ... you envision.

Sure, I was giving an idea that might be worth under some previous conditions, and maybe some expressioins, operators, functions... would be hard to compare this way. However...

Luc Pattyn wrote:

you will have a hard time to compare (i==48329134)?1:0 and (i==48329135)?1:0 in a heuristic way.

Not that hard in this case, since these expressions can be defined as: f1(n) = 1 if n==48329134; 0 in any other case f2(n) = 1 if n==48329135; 0 in any other case We have two base cases here for each one of the expressions: n=0 and n=48329134 in the first case; n=0 and n=48329135 on the second one, so the method I have described before would work perfectly and really quick finding the two expressions are not equivalent becouse one of the base cases do not match.

Matty22

Eric's suggestion is excellent and is quite similar to how the fast primality test finds large primes quickly. If you test in enough bases the chance of an equation failing can be reduced to 1 in billions This means it falls under the meteor rule.. If the chance of a false positive is less than being hit by a meteor today..It's okay to use it So I think Eric is got an excellent idea.

David1987

I respectfully disagree. This will be part of an optimizing compiler. If the meteor decides to hit us after all then someone will have a nearly impossible to diagnose bug. There are also large classes of expressions for which Eriks heuristic method does not work at all regardless of how many iterations it uses or require such a large number of iterations that it becomes impractical. For example: (uint64)x / 3 and ((uint64)x * 0x5555556) >> 32 are the same for x in the interval 0..0x7FFFFFFF but not outside it. That gives a 0.5^k error probability where k is the number of iterations - which is fairly good, but if you combine a couple of those you can make the error probability arbitrarily high. And I'm not even "cheating" with comparisons here - a single comparison against a constant can make the error probability arbitrarily high. They could be special-cased of course, but can the expressions above be special cased?

mbyamukama

David, if these expressions are analytic expressions (defined and differentiable for all values of input), then a simple proof by induction will ALWAYS verify their equivalence. So you only have to try for a few values.

BHM

David1987

Maybe so, but that's not what we have here, they are expressions with types such as int32 (see original post) and the class of expressions is not limited to something simple. For example, 1/x and 2/x differ only for x=1 and x=2 (edit: and -1 and -2), but there are 2^32 possibilities for x if it is an int32 and even more if it is an int64 - the chance that either 1 or 2 is tested isn't that big.

modified on Sunday, April 3, 2011 8:58 AM

Alex Shtof

Can you define or characterize this "good portion"? For example, if you want to detect equivalence between bounded-degree polynomials (two computer programs or expressions such that their result is eventually a polynomial in the input), and you know the degree bound, it can be easily done by checking a small and finite number of inputs.

David1987

A good portion is hard to define :) It should find "significantly" more equivalences than merely comparing whether two expressions are equal. The class of expressions is not constrained in any way. I suppose I could first detect whether two expressions are both polynomials and then use a special comparison, but there are plenty of other types of expression so in order to find a "good portion" of equivalences I'd need something for them too.