Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Why do TAR files always need to be decompressed twice?

Why do TAR files always need to be decompressed twice?

Scheduled Pinned Locked Moved The Lounge
question
29 Posts 24 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S swampwiz

    I've noticed that whenever I download a TAR file (and it usually is TAR.GZ) and decompress it, there always is yet another compressed file that needs to be decompressed. Why don't folks just do a single compression?

    P Offline
    P Offline
    PhM33
    wrote on last edited by
    #17

    A TAR file in your sense is indeed a TAR.GZ file, which embed two formats : TAR and GZ. Here's the process : 1. A TAR file is ceated, concatening several files together in their uncompressed form ; note that resulting TAR file is uncompressed, 2. A GZ file is created by compressing the previous TAR file. So to decompress a TAR.GZ file, you have to : 1. Decompress the compressed GZ file and 2. "Untar" (unarchive) the uncompressed resulting TAR file. Note that you can compress a TAR file with other popular compressors (bzip2 => TAR.BZ2, 7zip => TAR.7Z...).

    1 Reply Last reply
    0
    • enhzflepE enhzflep

      Simples. Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive, or in some cases, data that is contained within lots of small files. The solution is to slap all of the files together first in a monolithic chunk. You then run compression on that chunk in the (almost always delivered) hope that you'll achieve a smaller output than if the compressed output of all the contained files was then glued together into a single chunk. TAR - turn a bunch of files into one. GZ - compress a file.

      K Offline
      K Offline
      kalberts
      wrote on last edited by
      #18

      enhzflep wrote:

      Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive

      If you experience significantly better compression by merging a lot of small files into one, either your average file size is extremely small (like in one classical Unix study showing that for the system as a whole, more than 80% of the files were less than 5 kbytes). Or, you misinterpret data: it is not poorer compression, but more metadata, administrative information. One large file requires one descriptor, five thousand tiny files require five thousand descriptors. That is not poorer rate of compression, but similar to gathering the five thousand files into one even without compression: That would save the space of 4999 inodes, as well as the internal fragmentation loss - if file sizes are evenly distributed: half an allocation unit (/disk block) per file. You save space by making this huge file, but it has nothing to do with data compression. If you want to make an exact comparison, you cannot compare the size of the .tar file to the size of the .tar.gz file. That sure would give you the compression rate of the .tar file, but to created the .tar file you had to add a noticable amount of metadata. So what you save by having only one file/compression descriptor, you partially loose to .tar administrative information. I keep a number of 'archives' of many small files in .zip format, saving space due to the compression, of course, but also a lot is saved by not wasting 2 Kbyte on each file in internal fragmentation. Another advantage of .zipping up these file groups: I frequently move the files between machines on USB sticks. Writing a few thousand files to a USB stick takes a lot of time to create the files. I guess that it has to do with USB stick writes not being cached, at least not to the same degree, and file creation requires lots of writes, even if the file contents is done in one single write. Writing a single .zip archive to a USB stick is several times faster than writing two thousand tiny files. A similar situation: We run a fairly large build system, with about a hundred build agents. A build may be producing dozens, in some cases hundreds, of individual artifacts. On the central server, distributing these artifacts, the inode table exploded when each artifact was treated separately. We were forced to modify the b

      1 Reply Last reply
      0
      • M megaadam

        tar xvf file Does both in one comnand

        ... such stuff as dreams are made on

        K Offline
        K Offline
        Keith Barrow
        wrote on last edited by
        #19

        dingdingdingdingding!!!!!! We have a winner!!!!

        KeithBarrow.net[^] - It might not be very good, but at least it is free!

        1 Reply Last reply
        0
        • L Lost User

          double de-compression? you're doing it wrong. the proper [as per design] procedure: 1. un-gzip it and redirect the output to the tape device, ... (may necessitate co-ordinated use of 2 hands if tape device doesn't have automatic start/stop control) 2. rewind the tape ... (on some sites a separate rewinding machine - submit both the tape and duly completed rewind request) 3. and then: un-tar (tar -x) it from the tape reader into the destination directory. ... (please remember: folders are where you keep your forms and notes, files are stored in directories.) some new-fangled versions of tar have gzip built right in, many operators cant grok that. kids these days: always looking for shortcuts.

          This internet thing is amazing! Letting people use it: worst idea ever!

          H Offline
          H Offline
          hevisko
          wrote on last edited by
          #20

          I must agree, kids these days have all these slang wordings on phones, but can't grok that tar stands for TApe aRchive, i's jst s bvius.

          K 1 Reply Last reply
          0
          • L Lost User

            luxury! toggle switches!

            This internet thing is amazing! Letting people use it: worst idea ever!

            B Offline
            B Offline
            Bob1000
            wrote on last edited by
            #21

            luxury! Solder bits of wire!

            1 Reply Last reply
            0
            • L Lost User

              good point, the directory is just a list of name and inode number pairs. Such a simple system, could elegantly do things back then that NTFS still hasn't come close to without a whole mess of complicated hoops to jump through.

              This internet thing is amazing! Letting people use it: worst idea ever!

              P Offline
              P Offline
              Peter Adam
              wrote on last edited by
              #22

              For first, you compare part of the file system to a file system. Second, NTFS is more inode than inode itself, as it stores the file data itself as attribute. Third, for Windows there is NTFS since 1993. In the Unix world since we have seen at least half dozen über weltmeister open source, free as in freedom file systems, every one leaving everything else in the dust. So much about extensibility and stability.

              1 Reply Last reply
              0
              • M megaadam

                tar xvf file Does both in one comnand

                ... such stuff as dreams are made on

                S Offline
                S Offline
                svella
                wrote on last edited by
                #23

                megaadam wrote: tar xvf file Does both in one comnand Wrong - you need

                tar xvfz file

                1 Reply Last reply
                0
                • L Lost User

                  luxury! toggle switches!

                  This internet thing is amazing! Letting people use it: worst idea ever!

                  K Offline
                  K Offline
                  kalberts
                  wrote on last edited by
                  #24

                  Don't make fun of us oldies! Sure, it was back in the summer of 1978, before I started my studies, I got a summer job in a company where you had to flip switches, deposit, flip switches, deposit... I believe that mini bootstrap was 12 or 14 instructions long, enough to read in the short paper tape with the real bootstrap, so that we could mount the large reel of paper tape with, say, the compiler. For the system software, like the boot, it really wasn't paper, it was either mylar (I never met one who could tear off one of those mylar tapes with his bare hands), or plastic covered aluminum with so sharp edges that it could cut your throat if it got out of control. Some operators used leather gloves as a protection. But they never wore out, even if read a dozen times every day. On the other hand: If you were lucky enough to need the same program the next morning that you used last the previous day, you didn't have to reload it: The machines had real core memory that retained its contents even if you turned off the power. (One of the guys scoffed semiconductor RAM, insisting that it was a fad: Computer industry will never accept memory that requires power to be constantly on to retain its contents!) I even flipped switches at the University, but that was in a more specialized context: In one lab exercize we used a 2901 development kit. 2901 was a widely used 4 bit bit slice processor, that you could line up in twos for an 8 bit CPU, four for a 16 bit CPU, eight for a 32 bit CPU. It contained the hard logic, which was controlled by an external microcode RAM. For our lab, we had a 16 bit by 64 words RAM, and we wrote the microcode to read two 4 bit input values and place the sum on the output. Really knowing what's going on, all the way down to the signal paths, is impossible with the systems of today. Even understanding the operating system thorougly, software only, is out of reach. I must admid that I sometimes long back to those days when everything could be understood, you had a feeling of mastering it all, you never left anything to automagic mechanisms you would simply have to trust in blind faith. The nearest you'll get today is in embedded programming, coding plain C on, say, an ARM M0. I did that for a few years, and it felt as if nostalgia had turned real :-)

                  1 Reply Last reply
                  0
                  • H hevisko

                    I must agree, kids these days have all these slang wordings on phones, but can't grok that tar stands for TApe aRchive, i's jst s bvius.

                    K Offline
                    K Offline
                    kalberts
                    wrote on last edited by
                    #25

                    I wonder how much slang has been influenced by Unix (or *nix, if you prefer). Before *nix, slang made some degree of sense to me, but then we got these absurdities like 'less' for displaying av file (yes, I know its history!), GNU, and a thousand 'funny' but made-up and totally meaningless (in the way they are used) names and terms. I see more and more of that creeping into non-computer slang as well: terms with no etymological background related to the application, but with a completely unrelated meaning that is absurd in the context. Controlling the development of a natural language makes herding cats look like a task for five year olds.

                    1 Reply Last reply
                    0
                    • S swampwiz

                      I've noticed that whenever I download a TAR file (and it usually is TAR.GZ) and decompress it, there always is yet another compressed file that needs to be decompressed. Why don't folks just do a single compression?

                      J Offline
                      J Offline
                      Joop Eggen
                      wrote on last edited by
                      #26

                      The decompression can be combined, but indeed the format packed twice as .tar.gz or `.tar.bz2` or such. This is à la Unix where small operations are combined into on large operation. Its advantage here: tar (tape archive) concatenates all files, and the ensuing gz compression can do a "far" better compression over all content. As opposed to .zip compression.

                      1 Reply Last reply
                      0
                      • OriginalGriffO OriginalGriff

                        Tape? Tape? We don't use such new-fangled magnetics here! Send it to the card punch!

                        Sent from my Amstrad PC 1640 Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!

                        B Offline
                        B Offline
                        bleahy48
                        wrote on last edited by
                        #27

                        You need one of these new-fangled magnetic drum memories!

                        1 Reply Last reply
                        0
                        • M megaadam

                          tar xvf file Does both in one comnand

                          ... such stuff as dreams are made on

                          V Offline
                          V Offline
                          vtokar
                          wrote on last edited by
                          #28

                          Didn't you miss z?

                          1 Reply Last reply
                          0
                          • S swampwiz

                            I've noticed that whenever I download a TAR file (and it usually is TAR.GZ) and decompress it, there always is yet another compressed file that needs to be decompressed. Why don't folks just do a single compression?

                            R Offline
                            R Offline
                            rjmoses
                            wrote on last edited by
                            #29

                            Let's not forget that a fundamental thought in Unix was small, individual programs strung together through pipes and stdin/stdout redirection. So: find . -print |grep "draw" | grep "\.c$" |tar -c -T - |gzip -c > temp.tar.gz is an elaborate way (albeit inefficient) of finding all files ending in ".c" in the current directory and all subdirectories that contain "draw" as any part of the file name to create a tar file which is then zipped to stdout to a file named "temp.tar.gz" The point here is demonstrate hooking together small programs through piping. So, because they probably found that many tar files got zipped, someone got the idea of combining zip into the tar command through "tar -cvfz temp.tar.gz filelist". A good idea.

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups