Managing Memory and Resources [modified]

MrEyes

Hello all, I need some advice relating to managing memory and resources within the .NET Framework. Basically I have an application that from time to time will need to perform the following actions on 200mb+ strings: App A 1) Compress (using SharpZipLib) 2) Break the compressed byte[] into and array of byte[] chunks 3) Push each chunk onto a WebService App B 1) Receive chunks from WebService 2) Reconstruct chunks into a single byte[] 2) Uncompress As you can imagine this takes massive amounts of resources, infact my first stab (for only a 25mb string) resulted in peak memory usage around the 300mb mark. After some research I found that I can force the GC to collect unused objects, and therefore littered the code with this at relevant points. This resulted in a memory usage peak of around 200mb, better but still not perfect. So now I am looking for alternatives. At the moment I am mainly looking at doing all the work in an AppDomain and then once completed unload and kill this. However I would appreciate any input on alternative designs. Whilst I release this probably a bit much for a simple question, the following is the code I currently use. While this works (for small amounts of data), the design really isnt scaleable enough to handle large inputs (200mb+)

...
...
//deflate and chunk
object[] outData = DeflateAndChunk(data);
...
...
//dechunk and inflate
string reconstructedData = DechunkAndInflate(outData);
...
...

private static byte[] DeChunkData(object[] baseData)
{
int returnLength = 0;
foreach(byte[] ba in baseData)
{
returnLength += ba.Length;
}

byte\[\] readBuffer = new byte\[returnLength\];
using (Stream outStream = new MemoryStream(readBuffer))
{
	
	for (int loop=0; loop

private static object[] ChunkData(byte[] baseData, int chunkSize)
{
int returnArraySize = baseData.Length / chunkSize;
int baseDataModulus = baseData.Length % chunkSize;
if (baseDataModulus > 0)
{
returnArraySize++;
}
object[] returnData = new object[returnArraySize];

using (Stream baseDataStream = new MemoryStream(baseData))
{
	baseData = null;

	for (int loop=0;loop < returnArraySize; lo

Luc Pattyn

Hi, some thoughts: 1. why compress all the data at once, and then go through the trouble of chunking it ? cant you compress part of the data (say 10 MB) and send it, then the next part, etc. This avoids allocating and filling the big byte[] all together. 2. to operate on part of a byte[] the API should provide a method that accepts said array, a start index and a length, so you dont need to copy to get the subset of the array. Most .NET classes have this. 3. I advice against calling GC directly. The GC works fine, it collects when there is a need to collect, and AFAIK it uses adaptive algorithms, which will get disturbed by calling it explicitly. :)

Luc Pattyn [My Articles]

MrEyes

After doing a little research on GC collect I have come to the same conclusion as the one you posted, hence the idea of using it doesnt really sit very comfortably with me. Unfortunatly I have to compress and chunk, as the bigger bigger picture is that the send is done via a 3rd party intermediary that has defined this model. Unfortunately I do not control this and therefore I need to work with it

Judah Gabriel Himango

Can the compressing be done without loading it all into memory? For example, the System.IO.Compression APIs let you do this by writing to a stream, which is compressed on its own. The stream should point to a file on disk, thus you never have to load the big file into memory, instead the hard disk holds it all. As for sending the chunks, again, just read pieces of the compressed file using the standard System.IO.FileStream APIs.

Tech, life, family, faith: Give me a visit. I'm currently blogging about: A Torah-observer's answers to Christianity The apostle Paul, modernly speaking: Epistles of Paul Judah Himango

Scott Dorman

MrEyes wrote:

After doing a little research on GC collect I have come to the same conclusion as the one you posted, hence the idea of using it doesnt really sit very comfortably with me.

As Luc pointed out, you really should not be calling GC.Collect yourself. The issue here is that every time the GC starts a collection cycle, it actually freezes the main thread of your application so it can determine which objects are still being referenced. By calling GC.Collect yourself and forcing a collection cycle, you are increasing the amount of time your application will spend in garbage collection, thereby ultimately decreasing your performance.

----------------------------- In just two days, tomorrow will be yesterday.

Scott Dorman

Since strings are immutable in C#, every time you make a change to one you are creating a new string object. It doesn't look like you are doing much actual string manipulation beyond converting it to a byte array, but if you are you may want to look at StringBuilder[^]. I don't think using a separate AppDomain is really going to buy you much as you will then incurr the overhead of loading and unloading the AppDomain. Finally, how are you determing your memory usage? The information shown by task manager is not accurate for .NET applications and should not be used. You should be looking at the .NET related performance counters in perfmon.

----------------------------- In just two days, tomorrow will be yesterday.

MrEyes

Unfortunately I am stuck with .NET 1.1 so the compression libraries arent available for this

MrEyes

Well I have to admit I have been monitoring memory usage via task manager so I will have a look at perfmon. That being said, on large files (>200mb) the application eventually throws an out of memory exeception, which regardless of the monitoring mechanism is, how can a put it, catastrophic :omg: