Simple hashcode [modified]

gumi_r msn com

Hi all, I'm coding a small math library for an app we are developing. Its basically 3D math, Vectors, Matrix, etc. I'm currently coding a struct Vector3D and I've overriden Equals(obj b) and was currently studying if I needed to override GetHashCode() or if the base implementation would work. A few simple tests proved that the default base implementation will not work as I'm getting same HashCodes for different Vectors and that's not good :p. I'm not even sure I really need to implement a safe HashCode algorithm because I don't foresee the vector struct being used in such scenario, but still, I wouldn't like to leave something in my code that I know is not right. Never know how someone might consume the library in the future. So my question is: Can somebody point me out how I should go about generating a unique int from 3 random doubles? My vector has obviously 3 coords, xyz and two vectors are the same only if all coords are the same which means that my GetHashCode() should generate a unique number for each combination of (x,y,z). Any suggestions are welcome. P.D. I'm not even sure if its possible to do. If I stop to think about it, you have a finite set of hashcodes you can generate (int.Min,int.Max) while the combination of diferent x,y,z coords is endless. :confused: Am I misunderstanding the point of GetHashCode()? -- modified at 19:13 Wednesday 9th May, 2007

Luc Pattyn

Hi, AFAIK has codes dont have to be unique (and as you pointed out, they typically cant). The one requirement is that the same object must return the same hash code over and over. Basically it must always be true that

hash(obj1) != hash(obj2) ==> obj1 != obj2

You could satisfy this by having a constant hash (the hash based mechanism does not provide any benefit then), or by having the hash equal to (or derived from) just a few of the object's data bytes. Of course if you expect to rely on effective hashing, then the hash values should have a good distribution. So it is up to you to determine the balance between costs and benefits. :)

Luc Pattyn [My Articles]

gumi_r msn com

Hi Luc, thanks for the reply. I'm still missing something or I am completely confused and mistaken with the whole concept. Bear me out please: The requirement of a hash code you are saying is:

if ob1 != obj2 ==> hash(obj1) != hash(obj2)

Ok I understand that, but it also has to comply with:

if obj1 == obj2 ==> hash(obj1) == hash(obj2)

I really dont see how you can manage this with a simple algorithm when it comes to Value types. If we are talking about objects where reference equality is acceptable, then yes, it doesn't have to be unique with all possible cases, only with the existing object instances in memory...an easy hash that comes to mind in that case is returning the memoryaddress of the object (this will ensure the uniqueness of the hash and ensures that no other object can have the same hash while its in scope). But with value types (and some reference values too I guess (string comes to mind)) this behaviour of the Hash algorithm isn't acceptable. For example, take this code:

public void Example(Vector v1)
{
bool same = areSame(v1, v1);
}

public bool whatEver(object o1, object o2)
{
return o1.GetHashCode()==o2.GetHashCode();
}

this code will return false if we implement a reference equality hash code which doesnt make sense, as its the same vector even if its boxed in two different memory addresses. But how do we code a HashCode that will return you the same int for o1 and o2 but can assure you that no other existing vector in memory gives you that same code unless its also the exact same vector? Obviously its impossible to get an algorithm that can give you a unique int for all possible existing vectors (we dont have enough ints to do that) so the solution must be to ensure that the hash works as expected with the existing set of vectors in memory. Again, am I misunderstanding the meaning of GetHashCode() completely? -- modified at 4:47 Thursday 10th May, 2007

Luc Pattyn

Hi, Sorry, my original reply was wrong. Of course there is only one requirement:

if obj1 == obj2 ==> hash(obj1) == hash(obj2)

The inverse is not necessary (and cannot be achieved in general, since one cannot perform lossless compression from some amount of data towards a single 32-bit int). I think the "remarks" in MSDN on Object.GetHashCode are clear. It states: "For best results, the hash code must be based on the value of an instance field or property" and "GetHashCode must always return the same value for a given instance of the object" Now value types that get boxed twice produce two different objects, their Hash Code does not have to be identical. BTW I dont consider a Vector a value type. Hope this helps. :)

Luc Pattyn [My Articles]

gumi_r msn com

Ok, thanks again for the reply. I think I understand now. Anyhow, if you take a look at how MS generates hashcodes of ValueTypes like System.Drawing.Point you will notice that the same point boxed twice does produce the same hashcode even if they are not the same object, so I'd like to take that same approach. Does not seem that complicated anymore if we don't have to ensure uniqueness. Just out of curiosity why wouldn't you implement a 3D vector as a value type as similar structs like System.Drawing.Point is? Am I missing some major drawbacks? P.D. About MS comments on how GetHashCode() should behave I understand it perfectly but MS has one more recommendation and its that GetHashCode() should be compatible with .Equals(object o). Thats why the compiler flags you a warning that you should consider overriding GetHashCode() when you override .Equals(object o) (very normal situation when coding structs). That's what got me stumped as it's simply not possible to achieve 100% compatibility. -- modified at 5:12 Thursday 10th May, 2007

Luc Pattyn

Hi, I was thinking of a Vector as it exists in Java (it is similar to C#'s ArrayList). If your Vector is just a fixed and small number of coordinates, of course it can be a value type. :)

Luc Pattyn [My Articles]