The Proof that a GUID is not unique
-
I still try discourage everyone from using GUID's if it's not truly necessary. 1. Waste of disk space, if an 32bit integer is sufficient why use a GUID? 2. Waste of processing power, more expensive to create, more expensive to compare against etc. 3. GUID is way more difficult to read for humans Sure there are uses for it but in most cases you're perfectly fine without them.
-
Searching for the number of possible GUIDs (answering questions in the forum section) I struggled over this amusing SO Thread where they are seriously discussing how the non-unique nature of a GUID can be proofed. Enjoy[^] Edit -> Here is my answer to the said question (can you think of a creative one yourself?):
Quote:
A Guid is 128 bit. Therefore you would have to generate 2^128 + 1 GUIDs to encounter a single GUID twice. A thread on StackOverflow.com[^] says that you would need about 10790283070806014188970 years to encounter a single GUID twice, assuming your program does nothing else than creating GUIDs and runs at a processor speed of 1 GhZ, without any interruption by CPU power eaten by other programs or the operating system itself. As you probably can think now, encountering the same GUID twice would be very bad luck and can safely considered as being unrealistic.
People becoming wiser in order to notice the stupid things they did back in the young days. This doesn't mean that they really stop doing those things. Wise people still do stupid things, only on purpose.
Of course it is possible to generate a non-unique GUIDs, but if you use a reliable implementation of a good generation algorithm, you can be reasonably confident that GUIDs will be unique within the context that matters. The only time I have seen problems was when using an unreliable implementation - NetWare 5 used to have problems with duplicate GUIDs being generated when timesync caused the clock to go backwards.
-
Nicholas Marty wrote:
3. GUID is way more difficult to read for humans
So what? IDs are meant to be meaningless; so that's a good thing. Even with integers you'll likely wind up just copying-and-pasting anyway. Or do this: 00000000-0000-0000-0000-000000000001 00000000-0000-0000-0000-000000000002 00000000-0000-0000-0000-000000000003 00000000-0000-0000-0000-000000000004 00000000-0000-0000-0000-000000000005
But...but...integers fit so much more nicely into a query string! *duck and cover* Seriously though, one of the few uses I've found for GUIDS is to allow the keys in your tables to be easily moved between different databases, for instance to move between a production and test DB. But otherwise integers are more efficient and easier to work with. As for the uniqueness debate, who cares? Raise your hand if you have ever seen a duplicate GUID pop up in a real-world situation. It's like arguing about the randomness of pseudo-random number generators, it's a moot point for almost all real-world implementations. If you're generating thousands of GUIDs per second in a system that you expect to be around for centuries, then maybe you should worry about it. Otherwise, it's like worrying about the server being taken out by a meteor hit. And even if you're unlucky enough to have a collision, you'd have to have a pretty fragile system for that to be a huge disaster; you'll probably have a dupe showing up in a join somewhere, not that hard to find and fix.
-
But...but...integers fit so much more nicely into a query string! *duck and cover* Seriously though, one of the few uses I've found for GUIDS is to allow the keys in your tables to be easily moved between different databases, for instance to move between a production and test DB. But otherwise integers are more efficient and easier to work with. As for the uniqueness debate, who cares? Raise your hand if you have ever seen a duplicate GUID pop up in a real-world situation. It's like arguing about the randomness of pseudo-random number generators, it's a moot point for almost all real-world implementations. If you're generating thousands of GUIDs per second in a system that you expect to be around for centuries, then maybe you should worry about it. Otherwise, it's like worrying about the server being taken out by a meteor hit. And even if you're unlucky enough to have a collision, you'd have to have a pretty fragile system for that to be a huge disaster; you'll probably have a dupe showing up in a join somewhere, not that hard to find and fix.
Yup; my previous employer offered both standalone installations and a SAAS model where we hosted, and some of the primary keys were ints. When a client decided it was better to go from standalone to SAAS, merging was a giant PITA.
-
Yep, I always work on the principal that a GUID may be duplicated so I add Ticks since the Epoch or something like that to my GUIDs because Ticks should only ever increase. The chances of getting the exact same ticks in milliseconds AND a duplicate GUID are not gunna happen! you're welcome :)
-
Searching for the number of possible GUIDs (answering questions in the forum section) I struggled over this amusing SO Thread where they are seriously discussing how the non-unique nature of a GUID can be proofed. Enjoy[^] Edit -> Here is my answer to the said question (can you think of a creative one yourself?):
Quote:
A Guid is 128 bit. Therefore you would have to generate 2^128 + 1 GUIDs to encounter a single GUID twice. A thread on StackOverflow.com[^] says that you would need about 10790283070806014188970 years to encounter a single GUID twice, assuming your program does nothing else than creating GUIDs and runs at a processor speed of 1 GhZ, without any interruption by CPU power eaten by other programs or the operating system itself. As you probably can think now, encountering the same GUID twice would be very bad luck and can safely considered as being unrealistic.
People becoming wiser in order to notice the stupid things they did back in the young days. This doesn't mean that they really stop doing those things. Wise people still do stupid things, only on purpose.
Arguing about the uniqueness of GUIDs is pointless, given that they're integer numbers in a finite space, which means that soon or (most likely) later there's going to be a collision, however, for practical purposes we can say that they're unique.
CEO at: - Rafaga Systems - Para Facturas - Modern Components for the moment...
-
You sure? They are a lot like snowflakes: if you look at the fine detail they aren't the same. :laugh:
The only instant messaging I do involves my middle finger. English doesn't borrow from other languages. English follows other languages down dark alleys, knocks them over and goes through their pockets for loose grammar.
-
But...but...integers fit so much more nicely into a query string! *duck and cover* Seriously though, one of the few uses I've found for GUIDS is to allow the keys in your tables to be easily moved between different databases, for instance to move between a production and test DB. But otherwise integers are more efficient and easier to work with. As for the uniqueness debate, who cares? Raise your hand if you have ever seen a duplicate GUID pop up in a real-world situation. It's like arguing about the randomness of pseudo-random number generators, it's a moot point for almost all real-world implementations. If you're generating thousands of GUIDs per second in a system that you expect to be around for centuries, then maybe you should worry about it. Otherwise, it's like worrying about the server being taken out by a meteor hit. And even if you're unlucky enough to have a collision, you'd have to have a pretty fragile system for that to be a huge disaster; you'll probably have a dupe showing up in a join somewhere, not that hard to find and fix.
-
Searching for the number of possible GUIDs (answering questions in the forum section) I struggled over this amusing SO Thread where they are seriously discussing how the non-unique nature of a GUID can be proofed. Enjoy[^] Edit -> Here is my answer to the said question (can you think of a creative one yourself?):
Quote:
A Guid is 128 bit. Therefore you would have to generate 2^128 + 1 GUIDs to encounter a single GUID twice. A thread on StackOverflow.com[^] says that you would need about 10790283070806014188970 years to encounter a single GUID twice, assuming your program does nothing else than creating GUIDs and runs at a processor speed of 1 GhZ, without any interruption by CPU power eaten by other programs or the operating system itself. As you probably can think now, encountering the same GUID twice would be very bad luck and can safely considered as being unrealistic.
People becoming wiser in order to notice the stupid things they did back in the young days. This doesn't mean that they really stop doing those things. Wise people still do stupid things, only on purpose.
I've actually been burned twice in the same year by GUID collisions within unrelated software products from other companies. They are a very poor architecture choice.
-
Searching for the number of possible GUIDs (answering questions in the forum section) I struggled over this amusing SO Thread where they are seriously discussing how the non-unique nature of a GUID can be proofed. Enjoy[^] Edit -> Here is my answer to the said question (can you think of a creative one yourself?):
Quote:
A Guid is 128 bit. Therefore you would have to generate 2^128 + 1 GUIDs to encounter a single GUID twice. A thread on StackOverflow.com[^] says that you would need about 10790283070806014188970 years to encounter a single GUID twice, assuming your program does nothing else than creating GUIDs and runs at a processor speed of 1 GhZ, without any interruption by CPU power eaten by other programs or the operating system itself. As you probably can think now, encountering the same GUID twice would be very bad luck and can safely considered as being unrealistic.
People becoming wiser in order to notice the stupid things they did back in the young days. This doesn't mean that they really stop doing those things. Wise people still do stupid things, only on purpose.
This may not be in the same spirit of fun that the article sets up, but I thought GUIDs were guaranteed unique because they are based on MAC addresses that are guaranteed as unique!? So proving MAC addresses are not unique would be a prerequisite?!
"Courtesy is the product of a mature, disciplined mind ... ridicule is lack of the same - DPM"
-
This may not be in the same spirit of fun that the article sets up, but I thought GUIDs were guaranteed unique because they are based on MAC addresses that are guaranteed as unique!? So proving MAC addresses are not unique would be a prerequisite?!
"Courtesy is the product of a mature, disciplined mind ... ridicule is lack of the same - DPM"
But your server is generating GUIDs using the same MAC address, right? So it wouldn't be unique per GUID generated on that server. But you shouldn't have to worry about collisions with GUIDs generated on other machines, I guess.
-
But...but...integers fit so much more nicely into a query string! *duck and cover* Seriously though, one of the few uses I've found for GUIDS is to allow the keys in your tables to be easily moved between different databases, for instance to move between a production and test DB. But otherwise integers are more efficient and easier to work with. As for the uniqueness debate, who cares? Raise your hand if you have ever seen a duplicate GUID pop up in a real-world situation. It's like arguing about the randomness of pseudo-random number generators, it's a moot point for almost all real-world implementations. If you're generating thousands of GUIDs per second in a system that you expect to be around for centuries, then maybe you should worry about it. Otherwise, it's like worrying about the server being taken out by a meteor hit. And even if you're unlucky enough to have a collision, you'd have to have a pretty fragile system for that to be a huge disaster; you'll probably have a dupe showing up in a join somewhere, not that hard to find and fix.
StatementTerminator wrote:
easily moved between different databases
Yes indeed. Reminds me of one place I worked where identities were used and the only way to view PROD data was to have a tool copy the data to a DEV database -- but the tool didn't allow the IDs to be copied :sigh: , the copied rows all had new IDs that didn't match PROD. My argument isn't entirely against integers, but auto-increment integers over which the developer has no control. X|
StatementTerminator wrote:
integers are more efficient and easier to work with
My experience has been the opposite.
StatementTerminator wrote:
As for the uniqueness debate, who cares?
'Xactly
-
But your server is generating GUIDs using the same MAC address, right? So it wouldn't be unique per GUID generated on that server. But you shouldn't have to worry about collisions with GUIDs generated on other machines, I guess.
do {
myguid = getGUID();
} while(!exists(myguild))This would solve the problem with unique, it'll execute almost only once for the next several thousand years. It'll never generate bug for "bad luck". P.S. Sorry for my pseudocode, I write in javascript usually.
-
But your server is generating GUIDs using the same MAC address, right? So it wouldn't be unique per GUID generated on that server. But you shouldn't have to worry about collisions with GUIDs generated on other machines, I guess.
I assume their algorithm is designed to use the MAC Address as a seed and is contructed to guarantee unique GUIDs given this. You are right that there is a lot more to it than just the unique seed. I have never seen the algorithm and clearly never will. Sounds like a really interesting Math problem though. I would love to know the rest of the approach they used.
"Courtesy is the product of a mature, disciplined mind ... ridicule is lack of the same - DPM"
-
I assume their algorithm is designed to use the MAC Address as a seed and is contructed to guarantee unique GUIDs given this. You are right that there is a lot more to it than just the unique seed. I have never seen the algorithm and clearly never will. Sounds like a really interesting Math problem though. I would love to know the rest of the approach they used.
"Courtesy is the product of a mature, disciplined mind ... ridicule is lack of the same - DPM"
Systems without network cards can generate GUIDs. What do they use for their MAC address? Yup.. zeros. A GUID also includes clock ticks of some sort or another that (they hope) tick faster than the system can request GUIDs. I seem to recall there's some bits in there for sequence number within a clock tick, or maybe systems just keep track of the last one issued and ensure they don't generate duplicates. I think there might be some other sources of mostly unique bits thrown in, like CPU serial numbers or something. The idea being that within a given uptime, of a given OS load, on a given system, they are guaranteed to be unique, and between systems, they are as unique as reasonably possible. How are these bits packed into the GUID? It doesn't matter, no amount of deterministically massaging the bits will give you anything more unique. Massaging the source bits could obscure them and make backtracking to the original values for nefarious purposes more difficult, and I suspect its done. Using the source bits as seed for a pseudorandom number generator won't add uniqueness, but it is probably a pretty good, and inexpensive, way to deterministically massage the bits to obscure them. GUIDs were never absolutely guaranteed to be unique, and I'd be willing to bet most of those sources of unique bits are no longer unique once one starts running GUID generation code in virtual machines.
We can program with only 1's, but if all you've got are zeros, you've got nothing.
-
Systems without network cards can generate GUIDs. What do they use for their MAC address? Yup.. zeros. A GUID also includes clock ticks of some sort or another that (they hope) tick faster than the system can request GUIDs. I seem to recall there's some bits in there for sequence number within a clock tick, or maybe systems just keep track of the last one issued and ensure they don't generate duplicates. I think there might be some other sources of mostly unique bits thrown in, like CPU serial numbers or something. The idea being that within a given uptime, of a given OS load, on a given system, they are guaranteed to be unique, and between systems, they are as unique as reasonably possible. How are these bits packed into the GUID? It doesn't matter, no amount of deterministically massaging the bits will give you anything more unique. Massaging the source bits could obscure them and make backtracking to the original values for nefarious purposes more difficult, and I suspect its done. Using the source bits as seed for a pseudorandom number generator won't add uniqueness, but it is probably a pretty good, and inexpensive, way to deterministically massage the bits to obscure them. GUIDs were never absolutely guaranteed to be unique, and I'd be willing to bet most of those sources of unique bits are no longer unique once one starts running GUID generation code in virtual machines.
We can program with only 1's, but if all you've got are zeros, you've got nothing.
-
Systems without network cards can generate GUIDs. What do they use for their MAC address? Yup.. zeros. A GUID also includes clock ticks of some sort or another that (they hope) tick faster than the system can request GUIDs. I seem to recall there's some bits in there for sequence number within a clock tick, or maybe systems just keep track of the last one issued and ensure they don't generate duplicates. I think there might be some other sources of mostly unique bits thrown in, like CPU serial numbers or something. The idea being that within a given uptime, of a given OS load, on a given system, they are guaranteed to be unique, and between systems, they are as unique as reasonably possible. How are these bits packed into the GUID? It doesn't matter, no amount of deterministically massaging the bits will give you anything more unique. Massaging the source bits could obscure them and make backtracking to the original values for nefarious purposes more difficult, and I suspect its done. Using the source bits as seed for a pseudorandom number generator won't add uniqueness, but it is probably a pretty good, and inexpensive, way to deterministically massage the bits to obscure them. GUIDs were never absolutely guaranteed to be unique, and I'd be willing to bet most of those sources of unique bits are no longer unique once one starts running GUID generation code in virtual machines.
We can program with only 1's, but if all you've got are zeros, you've got nothing.
-
Searching for the number of possible GUIDs (answering questions in the forum section) I struggled over this amusing SO Thread where they are seriously discussing how the non-unique nature of a GUID can be proofed. Enjoy[^] Edit -> Here is my answer to the said question (can you think of a creative one yourself?):
Quote:
A Guid is 128 bit. Therefore you would have to generate 2^128 + 1 GUIDs to encounter a single GUID twice. A thread on StackOverflow.com[^] says that you would need about 10790283070806014188970 years to encounter a single GUID twice, assuming your program does nothing else than creating GUIDs and runs at a processor speed of 1 GhZ, without any interruption by CPU power eaten by other programs or the operating system itself. As you probably can think now, encountering the same GUID twice would be very bad luck and can safely considered as being unrealistic.
People becoming wiser in order to notice the stupid things they did back in the young days. This doesn't mean that they really stop doing those things. Wise people still do stupid things, only on purpose.
MS generates two types of GUID: one that is MAC address based and one that is time based. If you generate them sequentially, it will tke you some time to get a duplicate but if you generate them at random, it is a lot easier to get a duplicate. I have had duplicates several times.
-
Did it include the characters S, S, N, N, L, L, L, I, I, E, E, E, and E?
I wanna be a eunuchs developer! Pass me a bread knife!
I'm sure our dearly departed forum friend Leslie would confirm that. :laugh:
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.