Performance tip on extracting BLOB from database?
-
I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?
using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
{
if (!r2d2.HasRows)
throw new Exception();DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT")); if (!dir.Exists) dir.Create(); while (r2d2.Read()) //only one row... { string filnavn = Convert.ToString(r2d2\["filename"\]); string sizeString = Convert.ToString(r2d2\["size"\]); int size = 0; bool sizeOK = int.TryParse(sizeString, out size); bool filnavnOK = false; try { new System.IO.FileInfo(filnavn); filnavnOK = true; } catch (ArgumentException erna) { throw new Exception("file not valid"); } catch (System.IO.PathTooLongException) { throw new Exception("path not valid"); } catch (System.Exception errrr) { throw new Exception("filename not supported", errrr); } if (filnavnOK && sizeOK && size > 0) { FileStream fs = null; BinaryWriter bw = null; //checking for duplicates if (\_hashBrukteFilnavn.ContainsKey(filnavn)) filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate \_hashBrukteFilnavn\[filnavn\] = true; try { fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write); bw = new BinaryWriter(fs); completePath = Path.Combine(dir.Name, filnavn); int startIndex = 0; int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?) byte\[\] outbyte = new byte\[buffersize\]; long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize); //This loop is a bit slow.. How can I speed it up? while (retval == buffersize) { bw.Write(outbyte); bw.Flush(); startIndex += buffersize; retval = r2d2.GetB
-
I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?
using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
{
if (!r2d2.HasRows)
throw new Exception();DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT")); if (!dir.Exists) dir.Create(); while (r2d2.Read()) //only one row... { string filnavn = Convert.ToString(r2d2\["filename"\]); string sizeString = Convert.ToString(r2d2\["size"\]); int size = 0; bool sizeOK = int.TryParse(sizeString, out size); bool filnavnOK = false; try { new System.IO.FileInfo(filnavn); filnavnOK = true; } catch (ArgumentException erna) { throw new Exception("file not valid"); } catch (System.IO.PathTooLongException) { throw new Exception("path not valid"); } catch (System.Exception errrr) { throw new Exception("filename not supported", errrr); } if (filnavnOK && sizeOK && size > 0) { FileStream fs = null; BinaryWriter bw = null; //checking for duplicates if (\_hashBrukteFilnavn.ContainsKey(filnavn)) filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate \_hashBrukteFilnavn\[filnavn\] = true; try { fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write); bw = new BinaryWriter(fs); completePath = Path.Combine(dir.Name, filnavn); int startIndex = 0; int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?) byte\[\] outbyte = new byte\[buffersize\]; long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize); //This loop is a bit slow.. How can I speed it up? while (retval == buffersize) { bw.Write(outbyte); bw.Flush(); startIndex += buffersize; retval = r2d2.GetB
Hi, I would use a buffersize that is a multiple of 1K (1024), much larger than what you have, say 64K. And I wouldn't flush the BinaryWriter that often, once every MB or so is enough. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
-
I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?
using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
{
if (!r2d2.HasRows)
throw new Exception();DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT")); if (!dir.Exists) dir.Create(); while (r2d2.Read()) //only one row... { string filnavn = Convert.ToString(r2d2\["filename"\]); string sizeString = Convert.ToString(r2d2\["size"\]); int size = 0; bool sizeOK = int.TryParse(sizeString, out size); bool filnavnOK = false; try { new System.IO.FileInfo(filnavn); filnavnOK = true; } catch (ArgumentException erna) { throw new Exception("file not valid"); } catch (System.IO.PathTooLongException) { throw new Exception("path not valid"); } catch (System.Exception errrr) { throw new Exception("filename not supported", errrr); } if (filnavnOK && sizeOK && size > 0) { FileStream fs = null; BinaryWriter bw = null; //checking for duplicates if (\_hashBrukteFilnavn.ContainsKey(filnavn)) filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate \_hashBrukteFilnavn\[filnavn\] = true; try { fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write); bw = new BinaryWriter(fs); completePath = Path.Combine(dir.Name, filnavn); int startIndex = 0; int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?) byte\[\] outbyte = new byte\[buffersize\]; long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize); //This loop is a bit slow.. How can I speed it up? while (retval == buffersize) { bw.Write(outbyte); bw.Flush(); startIndex += buffersize; retval = r2d2.GetB
I store files as VARBINARY. To write them to disk I use:
using
(
System.IO.FileStream fsfi.Open ( System.IO.FileMode.Create , System.IO.FileAccess.Write , System.IO.FileShare.None )
)
{
byte[] content = (byte[]) ds [ 1 ].Rows [ i ] [ "FileContent" ] ;fs.Write ( content , 0 , content.Length ) ; fs.Close() ;
}
The framework/OS can then buffer it as it sees fit. (fi is a
System.IO.FileInfo
, ds is aSystem.Collections.Generic.IList<System.Data.DataTable>
) -
I store files as VARBINARY. To write them to disk I use:
using
(
System.IO.FileStream fsfi.Open ( System.IO.FileMode.Create , System.IO.FileAccess.Write , System.IO.FileShare.None )
)
{
byte[] content = (byte[]) ds [ 1 ].Rows [ i ] [ "FileContent" ] ;fs.Write ( content , 0 , content.Length ) ; fs.Close() ;
}
The framework/OS can then buffer it as it sees fit. (fi is a
System.IO.FileInfo
, ds is aSystem.Collections.Generic.IList<System.Data.DataTable>
) -
Hi, I would use a buffersize that is a multiple of 1K (1024), much larger than what you have, say 64K. And I wouldn't flush the BinaryWriter that often, once every MB or so is enough. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
So if default packet size for MSSQL 2008 is 4096 (Kbytes?), I could for example try to use 2048 Kbytes wich gives me: buffersize = 1024*2048=2097152? And if I want to flush every MB, something like this? :
int startIndex = 0;
int buffersize = 2097152; //(1024*2048=2097152)
byte[] outbyte = new byte[buffersize];long retval = r.GetBytes(2, startIndex, outbyte, 0, buffersize);
while (retval == buffersize)
{
bw.Write(outbyte);
if(startIndex%1048576==0) //FLush every MB? (1024*1024=1048576)
bw.Flush();
startIndex += buffersize;
retval = r.GetBytes(WriterTabell.Reader.GetOrdinal("nef_elfil"), startIndex, outbyte, 0, buffersize);
}
bw.Write(outbyte, 0, (int)retval);//flush, close, dispose
With this code I usually get only 1 loop wich seems to be faster :) Is there any way I can find the optimal buffersize? Can .NET do the job for me (without using DataSet)?
-
So if default packet size for MSSQL 2008 is 4096 (Kbytes?), I could for example try to use 2048 Kbytes wich gives me: buffersize = 1024*2048=2097152? And if I want to flush every MB, something like this? :
int startIndex = 0;
int buffersize = 2097152; //(1024*2048=2097152)
byte[] outbyte = new byte[buffersize];long retval = r.GetBytes(2, startIndex, outbyte, 0, buffersize);
while (retval == buffersize)
{
bw.Write(outbyte);
if(startIndex%1048576==0) //FLush every MB? (1024*1024=1048576)
bw.Flush();
startIndex += buffersize;
retval = r.GetBytes(WriterTabell.Reader.GetOrdinal("nef_elfil"), startIndex, outbyte, 0, buffersize);
}
bw.Write(outbyte, 0, (int)retval);//flush, close, dispose
With this code I usually get only 1 loop wich seems to be faster :) Is there any way I can find the optimal buffersize? Can .NET do the job for me (without using DataSet)?
Hi, I would: - use a smaller buffer (64KB, maybe 256KB), no need to waste 2MB; - not write strange constants, just write it as you mean it
64*1024
, the compiler or JIT will evaluate it once for you; - maybe flush every 16th iteration of the loop, so make it a for:for(int iter=0; retval==buffersize; iter++) {
...
if ((iter%16)==0) bw.Flush();
...
}Thomas ST wrote:
any way I can find the optimal buffersize?
by experiment, however it won't matter much once you get it large enough, and it will vary depending on system specs and maybe system load.
Thomas ST wrote:
Can .NET do the job for me (without using DataSet)?
don't know. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
-
Yes, and if that's a concern, just use a DataReader, which I do quite often.
-
Hi, I would: - use a smaller buffer (64KB, maybe 256KB), no need to waste 2MB; - not write strange constants, just write it as you mean it
64*1024
, the compiler or JIT will evaluate it once for you; - maybe flush every 16th iteration of the loop, so make it a for:for(int iter=0; retval==buffersize; iter++) {
...
if ((iter%16)==0) bw.Flush();
...
}Thomas ST wrote:
any way I can find the optimal buffersize?
by experiment, however it won't matter much once you get it large enough, and it will vary depending on system specs and maybe system load.
Thomas ST wrote:
Can .NET do the job for me (without using DataSet)?
don't know. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
I don't have time to experiment ;) I'm going for a buffersize of 256*1024 (and I will let the compiler do the math ;)), and flushing every iter%16==0. The example I found online had buffersize=255 (equals 255 bytes right?), but it appears to be much faster with a bigger buffersize! It also seems like avoiding flush() is a good idea. Thanks for helping me out :)
-
Yes, and if that's a concern, just use a DataReader, which I do quite often.