Performance tip on extracting BLOB from database?

Thomas ST

I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?

using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
{
if (!r2d2.HasRows)
throw new Exception();

DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT"));
if (!dir.Exists)
    dir.Create();

while (r2d2.Read()) //only one row...
{
    string filnavn = Convert.ToString(r2d2\["filename"\]);
    string sizeString = Convert.ToString(r2d2\["size"\]);
    int size = 0;
    bool sizeOK = int.TryParse(sizeString, out size);
    bool filnavnOK = false;

    try
    {
        new System.IO.FileInfo(filnavn);
        filnavnOK = true;
    }
    catch (ArgumentException erna)
    {
        throw new Exception("file not valid");
    }
    catch (System.IO.PathTooLongException)
    {
        throw new Exception("path not valid");
    }
    catch (System.Exception errrr)
    {
        throw new Exception("filename not supported", errrr);
    }

    if (filnavnOK && sizeOK && size > 0)
    {
        FileStream fs = null;
        BinaryWriter bw = null;

        //checking for duplicates
        if (\_hashBrukteFilnavn.ContainsKey(filnavn))
            filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate
        
        \_hashBrukteFilnavn\[filnavn\] = true;

        try
        {
            fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write);
            bw = new BinaryWriter(fs);

            completePath = Path.Combine(dir.Name, filnavn);

            int startIndex = 0;
            int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?)

            byte\[\] outbyte = new byte\[buffersize\];
            long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize);

            //This loop is a bit slow.. How can I speed it up?
            while (retval == buffersize) 
            {
                bw.Write(outbyte);
                bw.Flush();
                startIndex += buffersize;
                retval = r2d2.GetB

Luc Pattyn

Hi, I would use a buffersize that is a multiple of 1K (1024), much larger than what you have, say 64K. And I wouldn't flush the BinaryWriter that often, once every MB or so is enough. :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages

PIEBALDconsult

I store files as VARBINARY. To write them to disk I use:

using
(
System.IO.FileStream fs

fi.Open 
(
    System.IO.FileMode.Create
, 
    System.IO.FileAccess.Write
,   
    System.IO.FileShare.None
)

)
{
byte[] content = (byte[]) ds [ 1 ].Rows [ i ] [ "FileContent" ] ;

fs.Write ( content , 0 , content.Length ) ; 
                            
fs.Close() ;

}

The framework/OS can then buffer it as it sees fit. (fi is a System.IO.FileInfo , ds is a System.Collections.Generic.IList<System.Data.DataTable>)

Thomas ST

But then you use DataTable, doesn't this load everything into memory first?

Thomas ST

So if default packet size for MSSQL 2008 is 4096 (Kbytes?), I could for example try to use 2048 Kbytes wich gives me: buffersize = 1024*2048=2097152? And if I want to flush every MB, something like this? :

int startIndex = 0;
int buffersize = 2097152; //(1024*2048=2097152)
byte[] outbyte = new byte[buffersize];

long retval = r.GetBytes(2, startIndex, outbyte, 0, buffersize);
while (retval == buffersize)
{
bw.Write(outbyte);
if(startIndex%1048576==0) //FLush every MB? (1024*1024=1048576)
bw.Flush();
startIndex += buffersize;
retval = r.GetBytes(WriterTabell.Reader.GetOrdinal("nef_elfil"), startIndex, outbyte, 0, buffersize);
}
bw.Write(outbyte, 0, (int)retval);

//flush, close, dispose

With this code I usually get only 1 loop wich seems to be faster :) Is there any way I can find the optimal buffersize? Can .NET do the job for me (without using DataSet)?

Luc Pattyn

Hi, I would: - use a smaller buffer (64KB, maybe 256KB), no need to waste 2MB; - not write strange constants, just write it as you mean it 64*1024, the compiler or JIT will evaluate it once for you; - maybe flush every 16th iteration of the loop, so make it a for:

for(int iter=0; retval==buffersize; iter++) {
...
if ((iter%16)==0) bw.Flush();
...
}

Thomas ST wrote:

any way I can find the optimal buffersize?

by experiment, however it won't matter much once you get it large enough, and it will vary depending on system specs and maybe system load.

Thomas ST wrote:

Can .NET do the job for me (without using DataSet)?

don't know. :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages

PIEBALDconsult

Yes, and if that's a concern, just use a DataReader, which I do quite often.

Thomas ST

I don't have time to experiment ;) I'm going for a buffersize of 256*1024 (and I will let the compiler do the math ;)), and flushing every iter%16==0. The example I found online had buffersize=255 (equals 255 bytes right?), but it appears to be much faster with a bigger buffersize! It also seems like avoiding flush() is a good idea. Thanks for helping me out :)

Thomas ST

Forgot to mention it, but yes, it's a big concern. I'm extracting almost 120GB of files stored in aprox. 400000 BLOBs. :) But thanks for suggestions :)

Performance tip on extracting BLOB from database?

using ( System.IO.FileStream fs

using ( System.IO.FileStream fs

using
(
System.IO.FileStream fs

using
(
System.IO.FileStream fs