gzip compression and decompression

Richard Andrew x64

I need to decompress HTTP streams that have been compressed with "Content-Encoding: gzip". The gzip website ( gzip.org ) says I should use the zlib library if I want to work with data in memory. However, it just keeps giving me DATA_FORMAT errors. Has anyone worked with this library before? If so, please lend a helping hand! Is this the correct library to use for HTTP compression? I am simply taking the compressed content body, and passing it to the uncompress() function, and the documentation is so sparse, I can't tell if I'm doing it correctly.

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke

ldaoust

I have not used zlip to decompress a HTTP stream. However I use zlib to compress and decompress my own stream which is based on HTTP stream. While developing my routine I had to study the HTTP documentation (RFC). So, maybe I have an idea what could be the problem you have. It is possible that the stream you read is 'Transfer-Encoding: chunked'. If so, trying to decompress in one gulp will not work. You probably have to decompress chunk by chunk. Hope this helps. Keep this thread posted with your findings as I am interrested. Louis.

Louis * google is your friend *

Richard Andrew x64

ldaoust wrote:

It is possible that the stream you read is 'Transfer-Encoding: chunked'. If so, trying to decompress in one gulp will not work. You probably have to decompress chunk by chunk.

Louis, That is a brilliant insight! I hadn't thought of that. I will try to implement that right away.....:)

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke

Richard Andrew x64

Hi Louis, I think I have it figure out. The content gets compressed before the chunking is applied, so it does need to be decompressed as a whole, not chunk by chunk. However, your pointing me toward the transfer-encoding turned out to be what solved it, because I realized I was mishandling the chunks themselves. I was putting together the whole content without parsing the individual chunks to extract only the data, and not the chunk size. Now I think I'm on the right track. Thanks again for your brain power!

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke

ldaoust

Great! I am curious though. Should'nt it be possible to decompress chunk by chunk ? I am assuming that chunk transfer is the right choice when sending binary data in a streaming fashion. Since the binary data can be large and dynamic it would be built as chunk then compress then sent out. Maybe thats not the way a web server does it for files. Maybe it is still possible to do it this way. Would need something in the HTTP header to identicate this behaviour. My implementation does it chunk by chunk, but this is between propriatary server and propriatary client application. So far works very well. Glad I could help. Cheers.

Louis.

"Ambition without knowledge is like a boat on dry land" -Mr. Miyagi

Richard Andrew x64

ldaoust wrote:

Should'nt it be possible to decompress chunk by chunk ?

This is new to me, so I'm just going by what my O'Reilly book says. It says that the Content-Encoding is always applied before the Transfer-Encoding. This implies that all the chunks have to be reconstructed before decoding, but if I turn out to be wrong, I'll post it here.

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke

Steve S

You can happily take each chunk in turn, reconstruct it from the encoding, and feed it into the decompressor. Typically, what happens is that the original data is compressed (content-encoding), and then split into chunks (transfer-encoding). Since gzip (and the underlying zlib) is stream-oriented, you merely have to present the chunks in the right sequence, and typically it's a matter of using something like inflateinit2 followed by calls to inflate and finally inflateEnd.

Steve S Developer for hire

Richard Andrew x64

Thanks for taking the time to give your input. You sound like you have experience with the library, thanks. Now that you explain it, that's probably what Louis meant by what he said. :)

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke