Compress specific text for storage.
-
I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of
{"L", "R", "T", "B", "E", "P", "H", "V"}
does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator. Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it). That way, you're only storing one 4-byte integer.
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator. Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it). That way, you're only storing one 4-byte integer.
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean. BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^]) :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
-
I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of
{"L", "R", "T", "B", "E", "P", "H", "V"}
does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)That depends on the subset of puzzles you are storing. Some puzzle combinations will be more compressible than other combinations. For example, a checker board that is setup before the game starts will be very compressible using RLE compression (run-length encoding), as most adjacent squares will have the same value (red, black, empty). Also, if there are common configurations between puzzle sets (e.g., two puzzles have half their board setup the same), then you can use referential compression (i.e., one row would point to another row and that other row would indicate part of the game board's setup and the main row would indicate the rest of the setup). The trick with that form of compression is to not overdo it, because the key you use to point to the other row takes up space too. You can also separate the boards into different tables to take off a few bits. So, you could have Table_1, Table_2, Table_3, ..., Table_256. The table the data resides in represents the first few bits of data. So if you have 256 tables, you can shave off 8 bits (aka, a byte) from the data. But that doesn't make sense for all scenarios. And if you can store several boards together, that might offer further opportunities for compression. For example, if two boards are very similar, you can take the difference between them and compress that (such as with RLE compression) rather than both boards. That way, you only need one of the boards and the uncompressed difference to recreate the other board. That will only work if the distribution of game boards is optimal (i.e., if there are lots of groups of boards where they are very similar). This technique is used in video encoding. A diff is taken between a two frames and that along with one frame is stored rather than storing both frames. You can chain them so you store, say, 99 diffs and one full frame to recreate all 100 frames. The problem with that is that you must process 99 frames until you get to that 100th frame. That wouldn't be a problem if you actually want to do some processing on each frame (or, in your case, each board).
-
if corners can only have 4 or fewer values that saves 1 bit each, hence the whole board now fits in 13 bytes. You'll need strong further restrictions on the possibilities to do much better than that. The simple rule is:
numberOfBitsRequired = CEIL(base-two logarithm of the number of acceptable board states).
:)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
Luc Pattyn wrote:
if corners can only have 4 or fewer values that saves 1 bit each
With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)
-
Luc Pattyn wrote:
if corners can only have 4 or fewer values that saves 1 bit each
With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)
there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
-
there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session. :)
-
Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session. :)
I can only hope he is not studying DCT algorithms and the like. :laugh: I don't think he'll bring it well below 13 bytes, unless there is a lot he hasn't told. And that is the main reason I offered him Shannon, so he can calculate the optimum himself. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
-
Luc Pattyn wrote:
if corners can only have 4 or fewer values that saves 1 bit each
With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)
And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.
-
An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean. BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^]) :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
True, but 2^32 ought to be enough for anybody :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.
harold aptroot wrote:
And if all the cells on the edge can only have 5 different values
I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:
some specific cells (Corner cells) can have only 3 of those values
We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).
-
True, but 2^32 ought to be enough for anybody :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)like 640KB? :-D
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
-
True, but 2^32 ought to be enough for anybody :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)I'd say nobody will ever need more than 640KB of RAM. :rolleyes:
-
like 640KB? :-D
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
LOL, you just beat me.
-
LOL, you just beat me.
It wasn't a very obscure joke, for a site like this :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
harold aptroot wrote:
And if all the cells on the edge can only have 5 different values
I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:
some specific cells (Corner cells) can have only 3 of those values
We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).
If the corner cells can have 3 values and the middle cells 8, then I see the pattern "they can hold as many values as there are adjacent cells". The OP did not say it, but he's been a bit slow in providing information anyway. It's just a guess, and that's why there is an if near the beginning of my post. The combining is .. just combining. Combine a corner cell (3 states) with an edge cell (might have 5 states) to get 15 in total, which is very close to a power of 2.
-
It wasn't a very obscure joke, for a site like this :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels)Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it. :)
-
Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it. :)
Don't think it was me... But hey, it's an old joke :)
Proud to have finally moved to the A-Ark. Which one are you in?
Author of the Guardians Saga (Sci-Fi/Fantasy novels) -
I'd say nobody will ever need more than 640KB of RAM. :rolleyes:
for 6*6 cells with no more than 8 states each, not really compression, is it? :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
modified on Thursday, August 26, 2010 9:21 PM