Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Compress specific text for storage.

Compress specific text for storage.

Scheduled Pinned Locked Moved C#
cssdatabasegame-devalgorithmshelp
23 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K kevinnicol

    I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of {"L", "R", "T", "B", "E", "P", "H", "V"} does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)

    I Offline
    I Offline
    Ian Shlasko
    wrote on last edited by
    #6

    Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator. Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it). That way, you're only storing one 4-byte integer.

    Proud to have finally moved to the A-Ark. Which one are you in?
    Author of the Guardians Saga (Sci-Fi/Fantasy novels)

    L 1 Reply Last reply
    0
    • I Ian Shlasko

      Luc and harold made good suggestions... There's one other route that will work if you're generating the boards randomly. A lot of randomly-generated games work by not storing the board/scenario itself, but instead storing the seed for the random number generator. Usually, you get random numbers by seeding with Environment.TickCount, or something similar, but if you use a specific seed (Which could itself be randomly generated), you're guaranteed to get the same results every time, even with a complex algorithm (As long as you don't multi-thread it). That way, you're only storing one 4-byte integer.

      Proud to have finally moved to the A-Ark. Which one are you in?
      Author of the Guardians Saga (Sci-Fi/Fantasy novels)

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #7

      An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean. BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^]) :)

      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

      I 1 Reply Last reply
      0
      • K kevinnicol

        I'm building a little puzzle game that can have many (Trillions) of puzzles. Each puzzle can be simplified to a grid that is 6 by 6 where each grid can be one of 8 possible values. Right now I have each puzzle stored as a 36 char string (each char can be one of {"L", "R", "T", "B", "E", "P", "H", "V"} does anyone know a cool trick to take this type of string and compress it down so that it takes less space? Even shaving a few bytes off of it would be a huge help. (It'll make searching for it in the database a lot faster)

        A Offline
        A Offline
        AspDotNetDev
        wrote on last edited by
        #8

        That depends on the subset of puzzles you are storing. Some puzzle combinations will be more compressible than other combinations. For example, a checker board that is setup before the game starts will be very compressible using RLE compression (run-length encoding), as most adjacent squares will have the same value (red, black, empty). Also, if there are common configurations between puzzle sets (e.g., two puzzles have half their board setup the same), then you can use referential compression (i.e., one row would point to another row and that other row would indicate part of the game board's setup and the main row would indicate the rest of the setup). The trick with that form of compression is to not overdo it, because the key you use to point to the other row takes up space too. You can also separate the boards into different tables to take off a few bits. So, you could have Table_1, Table_2, Table_3, ..., Table_256. The table the data resides in represents the first few bits of data. So if you have 256 tables, you can shave off 8 bits (aka, a byte) from the data. But that doesn't make sense for all scenarios. And if you can store several boards together, that might offer further opportunities for compression. For example, if two boards are very similar, you can take the difference between them and compress that (such as with RLE compression) rather than both boards. That way, you only need one of the boards and the uncompressed difference to recreate the other board. That will only work if the distribution of game boards is optimal (i.e., if there are lots of groups of boards where they are very similar). This technique is used in video encoding. A diff is taken between a two frames and that along with one frame is stored rather than storing both frames. You can chain them so you store, say, 99 diffs and one full frame to recreate all 100 frames. The problem with that is that you must process 99 frames until you get to that 100th frame. That wouldn't be a problem if you actually want to do some processing on each frame (or, in your case, each board).

        [Forum Guidelines]

        1 Reply Last reply
        0
        • L Luc Pattyn

          if corners can only have 4 or fewer values that saves 1 bit each, hence the whole board now fits in 13 bytes. You'll need strong further restrictions on the possibilities to do much better than that. The simple rule is:

          numberOfBitsRequired = CEIL(base-two logarithm of the number of acceptable board states).

          :)

          Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

          Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

          A Offline
          A Offline
          AspDotNetDev
          wrote on last edited by
          #9

          Luc Pattyn wrote:

          if corners can only have 4 or fewer values that saves 1 bit each

          With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)

          [Forum Guidelines]

          L L 2 Replies Last reply
          0
          • A AspDotNetDev

            Luc Pattyn wrote:

            if corners can only have 4 or fewer values that saves 1 bit each

            With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)

            [Forum Guidelines]

            L Offline
            L Offline
            Luc Pattyn
            wrote on last edited by
            #10

            there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one. :)

            Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

            Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

            A 1 Reply Last reply
            0
            • L Luc Pattyn

              there's way too many ifs in your message. In my earlier reply I had 4 spare bits, so shaving one bit of every corner yielded another byte. Why make things more complex than they need to be, the scheme I offered is the most efficient for the situation as it has been described. And I'm with Einstein, keep things as simple as possible, but no simpler than that. And I'll join the scientists who amend the theory as soon as new facts come in that don't fit the current one. :)

              Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

              Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

              A Offline
              A Offline
              AspDotNetDev
              wrote on last edited by
              #11

              Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session. :)

              [Forum Guidelines]

              L 1 Reply Last reply
              0
              • A AspDotNetDev

                Nothing wrong with that. In fact, having your reply so clear and having my reply which elaborates on it probably will help the OP to understand both approaches more clearly. Just thought I'd add some additional input to this little brainstorm session. :)

                [Forum Guidelines]

                L Offline
                L Offline
                Luc Pattyn
                wrote on last edited by
                #12

                I can only hope he is not studying DCT algorithms and the like. :laugh: I don't think he'll bring it well below 13 bytes, unless there is a lot he hasn't told. And that is the main reason I offered him Shannon, so he can calculate the optimum himself. :)

                Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                1 Reply Last reply
                0
                • A AspDotNetDev

                  Luc Pattyn wrote:

                  if corners can only have 4 or fewer values that saves 1 bit each

                  With 4 corners and each corner allowing only 3 possible values, all 4 corners can be stored using 7 bits (because 3^4 = 81 possible combinations, which is less than the 128 combinations 7 bits can store). So, rather than use 2 bits per corner (for a total of 8 bits), only 7 bits are required. It's only a bit, but that might just help to make it down to the next lower byte if the OP can come up with other compression/encoding techniques to save space. :)

                  [Forum Guidelines]

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #13

                  And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.

                  A 1 Reply Last reply
                  0
                  • L Luc Pattyn

                    An interesting concept, however it mainly makes apparent a 4-byte seed isn't sufficient to generate all conceivable board set-ups: rather than some 2^108 it will produce no more than 2^32 of them, a mere drop in the ocean. BTW: the OP mentioned trillions, not sure he meant that in the long or short scale of ways (see here[^]) :)

                    Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                    Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                    I Offline
                    I Offline
                    Ian Shlasko
                    wrote on last edited by
                    #14

                    True, but 2^32 ought to be enough for anybody :)

                    Proud to have finally moved to the A-Ark. Which one are you in?
                    Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                    A L 2 Replies Last reply
                    0
                    • L Lost User

                      And if all the cells on the edge can only have 5 different values, you can combine each corner cell with an edge cell, giving 4 * 15 possibilities (6 bits), and then you have 5^12 possibilities left for all other edge cells together which fits in 28 bits, giving 85 bits in total.

                      A Offline
                      A Offline
                      AspDotNetDev
                      wrote on last edited by
                      #15

                      harold aptroot wrote:

                      And if all the cells on the edge can only have 5 different values

                      I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:

                      The OP wrote:

                      some specific cells (Corner cells) can have only 3 of those values

                      We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).

                      [Forum Guidelines]

                      L 1 Reply Last reply
                      0
                      • I Ian Shlasko

                        True, but 2^32 ought to be enough for anybody :)

                        Proud to have finally moved to the A-Ark. Which one are you in?
                        Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                        L Offline
                        L Offline
                        Luc Pattyn
                        wrote on last edited by
                        #16

                        like 640KB? :-D

                        Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                        Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                        A 1 Reply Last reply
                        0
                        • I Ian Shlasko

                          True, but 2^32 ought to be enough for anybody :)

                          Proud to have finally moved to the A-Ark. Which one are you in?
                          Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                          A Offline
                          A Offline
                          AspDotNetDev
                          wrote on last edited by
                          #17

                          I'd say nobody will ever need more than 640KB of RAM. :rolleyes:

                          [Forum Guidelines]

                          L 1 Reply Last reply
                          0
                          • L Luc Pattyn

                            like 640KB? :-D

                            Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                            Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                            A Offline
                            A Offline
                            AspDotNetDev
                            wrote on last edited by
                            #18

                            LOL, you just beat me.

                            [Forum Guidelines]

                            I 1 Reply Last reply
                            0
                            • A AspDotNetDev

                              LOL, you just beat me.

                              [Forum Guidelines]

                              I Offline
                              I Offline
                              Ian Shlasko
                              wrote on last edited by
                              #19

                              It wasn't a very obscure joke, for a site like this :)

                              Proud to have finally moved to the A-Ark. Which one are you in?
                              Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                              A 1 Reply Last reply
                              0
                              • A AspDotNetDev

                                harold aptroot wrote:

                                And if all the cells on the edge can only have 5 different values

                                I don't really follow what you said (e.g., what do you mean by "combine each corner cell with an edge cell"?), but keep this in mind:

                                The OP wrote:

                                some specific cells (Corner cells) can have only 3 of those values

                                We already know each corner cell can only use 3 (rather than 8) values. I didn't just pick an arbitrary value, if that's what you are implying (I honestly don't know what you mean).

                                [Forum Guidelines]

                                L Offline
                                L Offline
                                Lost User
                                wrote on last edited by
                                #20

                                If the corner cells can have 3 values and the middle cells 8, then I see the pattern "they can hold as many values as there are adjacent cells". The OP did not say it, but he's been a bit slow in providing information anyway. It's just a guess, and that's why there is an if near the beginning of my post. The combining is .. just combining. Combine a corner cell (3 states) with an edge cell (might have 5 states) to get 15 in total, which is very close to a power of 2.

                                1 Reply Last reply
                                0
                                • I Ian Shlasko

                                  It wasn't a very obscure joke, for a site like this :)

                                  Proud to have finally moved to the A-Ark. Which one are you in?
                                  Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                                  A Offline
                                  A Offline
                                  AspDotNetDev
                                  wrote on last edited by
                                  #21

                                  Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it. :)

                                  [Forum Guidelines]

                                  I 1 Reply Last reply
                                  0
                                  • A AspDotNetDev

                                    Haha, yeah, I saw it posted the other day. I wondered if perhaps it was you who posted it. :)

                                    [Forum Guidelines]

                                    I Offline
                                    I Offline
                                    Ian Shlasko
                                    wrote on last edited by
                                    #22

                                    Don't think it was me... But hey, it's an old joke :)

                                    Proud to have finally moved to the A-Ark. Which one are you in?
                                    Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                                    1 Reply Last reply
                                    0
                                    • A AspDotNetDev

                                      I'd say nobody will ever need more than 640KB of RAM. :rolleyes:

                                      [Forum Guidelines]

                                      L Offline
                                      L Offline
                                      Luc Pattyn
                                      wrote on last edited by
                                      #23

                                      for 6*6 cells with no more than 8 states each, not really compression, is it? :)

                                      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                                      Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                                      modified on Thursday, August 26, 2010 9:21 PM

                                      1 Reply Last reply
                                      0
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Don't have an account? Register

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups