Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Algorithms
  4. compression

compression

Scheduled Pinned Locked Moved Algorithms
questionalgorithms
6 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    khomeyni
    wrote on last edited by
    #1

    hello i have a file with only entry of 0..9 (numbers only) without any other thing even space , what is the most efficient algorithm to compress it? thanks in advance.

    L 1 Reply Last reply
    0
    • K khomeyni

      hello i have a file with only entry of 0..9 (numbers only) without any other thing even space , what is the most efficient algorithm to compress it? thanks in advance.

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      at the command level, use a ZIP utility, such as WinZip. When programming an application, create a binary file. Assuming all digit values have same probability, you could: - create a binary file with BinaryWriter class; it will be filled with ulong values; - the first value to write would be the number of digits to be stored; - then have a loop that reads 19 digits, turns them into a single ulong (use ulong.Parse or TryParse), and outputs that ulong in binary; - when reaching the end of the input data, which probably did not have a length evenly divisible by 19, just imagine the few missing digits are all zero, and generate the last ulong as you did all the others. That is it; you have turned N characters (probably bytes in an ASCII text file) into a sequence of N/19*8 bytes, giving a compression factor of 2.37 You could do better if something was known about the distribution of the digits; otherwise that is about the best you could get. :)

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      B 1 Reply Last reply
      0
      • L Luc Pattyn

        at the command level, use a ZIP utility, such as WinZip. When programming an application, create a binary file. Assuming all digit values have same probability, you could: - create a binary file with BinaryWriter class; it will be filled with ulong values; - the first value to write would be the number of digits to be stored; - then have a loop that reads 19 digits, turns them into a single ulong (use ulong.Parse or TryParse), and outputs that ulong in binary; - when reaching the end of the input data, which probably did not have a length evenly divisible by 19, just imagine the few missing digits are all zero, and generate the last ulong as you did all the others. That is it; you have turned N characters (probably bytes in an ASCII text file) into a sequence of N/19*8 bytes, giving a compression factor of 2.37 You could do better if something was known about the distribution of the digits; otherwise that is about the best you could get. :)

        Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

        Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

        B Offline
        B Offline
        Bernhard Hiller
        wrote on last edited by
        #3

        No, a compression factor of 2.37 for that purpose is not good at all. Just send a text file to a "zip compressed" folder in Windows explorer, and you get a compression ratio of about 10. And, of course, the text file contains not only digits but other ASCII characters also, which means that it already contains much more information per byte of storage, than does the file with digits only. Hence I'd take some common zip algorithm. The "ICSharpZipLib" is also available for commercial products (see http://www.sharpdevelop.net/OpenSource/SharpZipLib/[^]). In the Unix world, I sometimes see "bz2" compressed files which are said to be extremely compressed; but I do not know a library to handle that with .NET.

        L 1 Reply Last reply
        0
        • B Bernhard Hiller

          No, a compression factor of 2.37 for that purpose is not good at all. Just send a text file to a "zip compressed" folder in Windows explorer, and you get a compression ratio of about 10. And, of course, the text file contains not only digits but other ASCII characters also, which means that it already contains much more information per byte of storage, than does the file with digits only. Hence I'd take some common zip algorithm. The "ICSharpZipLib" is also available for commercial products (see http://www.sharpdevelop.net/OpenSource/SharpZipLib/[^]). In the Unix world, I sometimes see "bz2" compressed files which are said to be extremely compressed; but I do not know a library to handle that with .NET.

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #4

          Incorrect. You either believe in magic, or you didn't understand my suggestion. So I created a little proof:

          public void generateFiles() {
          using (TextWriter tw=File.CreateText("test.txt")) {
          using (BinaryWriter bw=new BinaryWriter(File.Create("test.dat"))) {
          int DIM=100*1000;
          Random r=new Random();
          for (int i=0; i<DIM; i++) {
          ulong n=0;
          for (int j=0; j<19; j++) {
          int d=r.Next(10);
          tw.Write("01234567890".Substring(d,1));
          n=10*n+(ulong)d;
          }
          bw.Write(n);
          }
          }
          }
          }

          and I WinZipped the txt file; the resulting sizes were:

          .txt = 1,856 KB
          .dat = 782 KB (that is a compression factor of 2.37 indeed)
          .zip = 872 KB (that is 2.13)

          which tells me WinZip is wasting around 10%. While ZIP normally achieves around a factor of 10, that simply can not be achieved when the data is really random and has log2(10) significant bits in each 8-bit byte (if it could, it would store a digit in less than one bit which is clearly impossible) :)

          Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

          Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

          B 1 Reply Last reply
          0
          • L Luc Pattyn

            Incorrect. You either believe in magic, or you didn't understand my suggestion. So I created a little proof:

            public void generateFiles() {
            using (TextWriter tw=File.CreateText("test.txt")) {
            using (BinaryWriter bw=new BinaryWriter(File.Create("test.dat"))) {
            int DIM=100*1000;
            Random r=new Random();
            for (int i=0; i<DIM; i++) {
            ulong n=0;
            for (int j=0; j<19; j++) {
            int d=r.Next(10);
            tw.Write("01234567890".Substring(d,1));
            n=10*n+(ulong)d;
            }
            bw.Write(n);
            }
            }
            }
            }

            and I WinZipped the txt file; the resulting sizes were:

            .txt = 1,856 KB
            .dat = 782 KB (that is a compression factor of 2.37 indeed)
            .zip = 872 KB (that is 2.13)

            which tells me WinZip is wasting around 10%. While ZIP normally achieves around a factor of 10, that simply can not be achieved when the data is really random and has log2(10) significant bits in each 8-bit byte (if it could, it would store a digit in less than one bit which is clearly impossible) :)

            Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

            Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

            B Offline
            B Offline
            Bernhard Hiller
            wrote on last edited by
            #5

            That is, the high compression rate of Windows zip with texts like log files is caused by the repetitive structure of the contents, though some 6 out of 8 bits per byte are used for the text. With log2(10) significant bits per byte, the theoretical limit should then be a compression factor of 2.408, thus your idea already achieves some 98.5% of it. Let's see how much more number crunching and execution time will be required to get it one percent better. :)

            L 1 Reply Last reply
            0
            • B Bernhard Hiller

              That is, the high compression rate of Windows zip with texts like log files is caused by the repetitive structure of the contents, though some 6 out of 8 bits per byte are used for the text. With log2(10) significant bits per byte, the theoretical limit should then be a compression factor of 2.408, thus your idea already achieves some 98.5% of it. Let's see how much more number crunching and execution time will be required to get it one percent better. :)

              L Offline
              L Offline
              Luc Pattyn
              wrote on last edited by
              #6

              one could perform my method with a wider integer; with a 256-bit integer the compression factor would be 77/32=2.406 The BigInteger class could be helpful to achieve that and more. I didn't mention it earlier as I don't care about the last few percents. :)

              Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

              Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups