Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Performance tip on extracting BLOB from database?

Performance tip on extracting BLOB from database?

Scheduled Pinned Locked Moved C#
questionperformancedatabasesysadmintutorial
9 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    Thomas ST
    wrote on last edited by
    #1

    I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?

    using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
    {
    if (!r2d2.HasRows)
    throw new Exception();

    DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT"));
    if (!dir.Exists)
        dir.Create();
    
    while (r2d2.Read()) //only one row...
    {
        string filnavn = Convert.ToString(r2d2\["filename"\]);
        string sizeString = Convert.ToString(r2d2\["size"\]);
        int size = 0;
        bool sizeOK = int.TryParse(sizeString, out size);
        bool filnavnOK = false;
    
        try
        {
            new System.IO.FileInfo(filnavn);
            filnavnOK = true;
        }
        catch (ArgumentException erna)
        {
            throw new Exception("file not valid");
        }
        catch (System.IO.PathTooLongException)
        {
            throw new Exception("path not valid");
        }
        catch (System.Exception errrr)
        {
            throw new Exception("filename not supported", errrr);
        }
    
        if (filnavnOK && sizeOK && size > 0)
        {
            FileStream fs = null;
            BinaryWriter bw = null;
    
            //checking for duplicates
            if (\_hashBrukteFilnavn.ContainsKey(filnavn))
                filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate
            
            \_hashBrukteFilnavn\[filnavn\] = true;
    
            try
            {
                fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write);
                bw = new BinaryWriter(fs);
    
                completePath = Path.Combine(dir.Name, filnavn);
    
                int startIndex = 0;
                int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?)
    
                byte\[\] outbyte = new byte\[buffersize\];
                long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize);
    
                //This loop is a bit slow.. How can I speed it up?
                while (retval == buffersize) 
                {
                    bw.Write(outbyte);
                    bw.Flush();
                    startIndex += buffersize;
                    retval = r2d2.GetB
    
    L P 2 Replies Last reply
    0
    • T Thomas ST

      I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?

      using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
      {
      if (!r2d2.HasRows)
      throw new Exception();

      DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT"));
      if (!dir.Exists)
          dir.Create();
      
      while (r2d2.Read()) //only one row...
      {
          string filnavn = Convert.ToString(r2d2\["filename"\]);
          string sizeString = Convert.ToString(r2d2\["size"\]);
          int size = 0;
          bool sizeOK = int.TryParse(sizeString, out size);
          bool filnavnOK = false;
      
          try
          {
              new System.IO.FileInfo(filnavn);
              filnavnOK = true;
          }
          catch (ArgumentException erna)
          {
              throw new Exception("file not valid");
          }
          catch (System.IO.PathTooLongException)
          {
              throw new Exception("path not valid");
          }
          catch (System.Exception errrr)
          {
              throw new Exception("filename not supported", errrr);
          }
      
          if (filnavnOK && sizeOK && size > 0)
          {
              FileStream fs = null;
              BinaryWriter bw = null;
      
              //checking for duplicates
              if (\_hashBrukteFilnavn.ContainsKey(filnavn))
                  filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate
              
              \_hashBrukteFilnavn\[filnavn\] = true;
      
              try
              {
                  fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write);
                  bw = new BinaryWriter(fs);
      
                  completePath = Path.Combine(dir.Name, filnavn);
      
                  int startIndex = 0;
                  int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?)
      
                  byte\[\] outbyte = new byte\[buffersize\];
                  long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize);
      
                  //This loop is a bit slow.. How can I speed it up?
                  while (retval == buffersize) 
                  {
                      bw.Write(outbyte);
                      bw.Flush();
                      startIndex += buffersize;
                      retval = r2d2.GetB
      
      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Hi, I would use a buffersize that is a multiple of 1K (1024), much larger than what you have, say 64K. And I wouldn't flush the BinaryWriter that often, once every MB or so is enough. :)

      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


      I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


      T 1 Reply Last reply
      0
      • T Thomas ST

        I am extracting 378000 BLOBS from a database. This is mostly pdf,doc, and docx, but also some big wav,tiff,mpg, ++. The code I'm currently using is a bit slow.. Does anyone have a tip on how to speed things up?

        using (r2d2 = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess))
        {
        if (!r2d2.HasRows)
        throw new Exception();

        DirectoryInfo dir = new DirectoryInfo(Path.Combine(\_arbeider.KonverteringsAlternativer.RootOutputDirectory, "DOKUMENT"));
        if (!dir.Exists)
            dir.Create();
        
        while (r2d2.Read()) //only one row...
        {
            string filnavn = Convert.ToString(r2d2\["filename"\]);
            string sizeString = Convert.ToString(r2d2\["size"\]);
            int size = 0;
            bool sizeOK = int.TryParse(sizeString, out size);
            bool filnavnOK = false;
        
            try
            {
                new System.IO.FileInfo(filnavn);
                filnavnOK = true;
            }
            catch (ArgumentException erna)
            {
                throw new Exception("file not valid");
            }
            catch (System.IO.PathTooLongException)
            {
                throw new Exception("path not valid");
            }
            catch (System.Exception errrr)
            {
                throw new Exception("filename not supported", errrr);
            }
        
            if (filnavnOK && sizeOK && size > 0)
            {
                FileStream fs = null;
                BinaryWriter bw = null;
        
                //checking for duplicates
                if (\_hashBrukteFilnavn.ContainsKey(filnavn))
                    filnavn = Path.GetRandomFileName().Substring(0, 5) + filnavn; //duplicate
                
                \_hashBrukteFilnavn\[filnavn\] = true;
        
                try
                {
                    fs = new FileStream(Path.Combine(dir.FullName, filnavn), FileMode.OpenOrCreate, FileAccess.Write);
                    bw = new BinaryWriter(fs);
        
                    completePath = Path.Combine(dir.Name, filnavn);
        
                    int startIndex = 0;
                    int buffersize = 3072; //what is best? default network-packet size for mssql2008 is 4096(?)
        
                    byte\[\] outbyte = new byte\[buffersize\];
                    long retval = r2d2.GetBytes(2, startIndex, outbyte, 0, buffersize);
        
                    //This loop is a bit slow.. How can I speed it up?
                    while (retval == buffersize) 
                    {
                        bw.Write(outbyte);
                        bw.Flush();
                        startIndex += buffersize;
                        retval = r2d2.GetB
        
        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #3

        I store files as VARBINARY. To write them to disk I use:

        using
        (
        System.IO.FileStream fs

        fi.Open 
        (
            System.IO.FileMode.Create
        , 
            System.IO.FileAccess.Write
        ,   
            System.IO.FileShare.None
        )
        

        )
        {
        byte[] content = (byte[]) ds [ 1 ].Rows [ i ] [ "FileContent" ] ;

        fs.Write ( content , 0 , content.Length ) ; 
                                    
        fs.Close() ;
        

        }

        The framework/OS can then buffer it as it sees fit. (fi is a System.IO.FileInfo , ds is a System.Collections.Generic.IList<System.Data.DataTable>)

        T 1 Reply Last reply
        0
        • P PIEBALDconsult

          I store files as VARBINARY. To write them to disk I use:

          using
          (
          System.IO.FileStream fs

          fi.Open 
          (
              System.IO.FileMode.Create
          , 
              System.IO.FileAccess.Write
          ,   
              System.IO.FileShare.None
          )
          

          )
          {
          byte[] content = (byte[]) ds [ 1 ].Rows [ i ] [ "FileContent" ] ;

          fs.Write ( content , 0 , content.Length ) ; 
                                      
          fs.Close() ;
          

          }

          The framework/OS can then buffer it as it sees fit. (fi is a System.IO.FileInfo , ds is a System.Collections.Generic.IList<System.Data.DataTable>)

          T Offline
          T Offline
          Thomas ST
          wrote on last edited by
          #4

          But then you use DataTable, doesn't this load everything into memory first?

          P 1 Reply Last reply
          0
          • L Luc Pattyn

            Hi, I would use a buffersize that is a multiple of 1K (1024), much larger than what you have, say 64K. And I wouldn't flush the BinaryWriter that often, once every MB or so is enough. :)

            Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


            I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


            T Offline
            T Offline
            Thomas ST
            wrote on last edited by
            #5

            So if default packet size for MSSQL 2008 is 4096 (Kbytes?), I could for example try to use 2048 Kbytes wich gives me: buffersize = 1024*2048=2097152? And if I want to flush every MB, something like this? :

            int startIndex = 0;
            int buffersize = 2097152; //(1024*2048=2097152)
            byte[] outbyte = new byte[buffersize];

            long retval = r.GetBytes(2, startIndex, outbyte, 0, buffersize);
            while (retval == buffersize)
            {
            bw.Write(outbyte);
            if(startIndex%1048576==0) //FLush every MB? (1024*1024=1048576)
            bw.Flush();
            startIndex += buffersize;
            retval = r.GetBytes(WriterTabell.Reader.GetOrdinal("nef_elfil"), startIndex, outbyte, 0, buffersize);
            }
            bw.Write(outbyte, 0, (int)retval);

            //flush, close, dispose

            With this code I usually get only 1 loop wich seems to be faster :) Is there any way I can find the optimal buffersize? Can .NET do the job for me (without using DataSet)?

            L 1 Reply Last reply
            0
            • T Thomas ST

              So if default packet size for MSSQL 2008 is 4096 (Kbytes?), I could for example try to use 2048 Kbytes wich gives me: buffersize = 1024*2048=2097152? And if I want to flush every MB, something like this? :

              int startIndex = 0;
              int buffersize = 2097152; //(1024*2048=2097152)
              byte[] outbyte = new byte[buffersize];

              long retval = r.GetBytes(2, startIndex, outbyte, 0, buffersize);
              while (retval == buffersize)
              {
              bw.Write(outbyte);
              if(startIndex%1048576==0) //FLush every MB? (1024*1024=1048576)
              bw.Flush();
              startIndex += buffersize;
              retval = r.GetBytes(WriterTabell.Reader.GetOrdinal("nef_elfil"), startIndex, outbyte, 0, buffersize);
              }
              bw.Write(outbyte, 0, (int)retval);

              //flush, close, dispose

              With this code I usually get only 1 loop wich seems to be faster :) Is there any way I can find the optimal buffersize? Can .NET do the job for me (without using DataSet)?

              L Offline
              L Offline
              Luc Pattyn
              wrote on last edited by
              #6

              Hi, I would: - use a smaller buffer (64KB, maybe 256KB), no need to waste 2MB; - not write strange constants, just write it as you mean it 64*1024, the compiler or JIT will evaluate it once for you; - maybe flush every 16th iteration of the loop, so make it a for:

              for(int iter=0; retval==buffersize; iter++) {
              ...
              if ((iter%16)==0) bw.Flush();
              ...
              }

              Thomas ST wrote:

              any way I can find the optimal buffersize?

              by experiment, however it won't matter much once you get it large enough, and it will vary depending on system specs and maybe system load.

              Thomas ST wrote:

              Can .NET do the job for me (without using DataSet)?

              don't know. :)

              Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


              I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


              T 1 Reply Last reply
              0
              • T Thomas ST

                But then you use DataTable, doesn't this load everything into memory first?

                P Offline
                P Offline
                PIEBALDconsult
                wrote on last edited by
                #7

                Yes, and if that's a concern, just use a DataReader, which I do quite often.

                T 1 Reply Last reply
                0
                • L Luc Pattyn

                  Hi, I would: - use a smaller buffer (64KB, maybe 256KB), no need to waste 2MB; - not write strange constants, just write it as you mean it 64*1024, the compiler or JIT will evaluate it once for you; - maybe flush every 16th iteration of the loop, so make it a for:

                  for(int iter=0; retval==buffersize; iter++) {
                  ...
                  if ((iter%16)==0) bw.Flush();
                  ...
                  }

                  Thomas ST wrote:

                  any way I can find the optimal buffersize?

                  by experiment, however it won't matter much once you get it large enough, and it will vary depending on system specs and maybe system load.

                  Thomas ST wrote:

                  Can .NET do the job for me (without using DataSet)?

                  don't know. :)

                  Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


                  I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


                  T Offline
                  T Offline
                  Thomas ST
                  wrote on last edited by
                  #8

                  I don't have time to experiment ;) I'm going for a buffersize of 256*1024 (and I will let the compiler do the math ;)), and flushing every iter%16==0. The example I found online had buffersize=255 (equals 255 bytes right?), but it appears to be much faster with a bigger buffersize! It also seems like avoiding flush() is a good idea. Thanks for helping me out :)

                  1 Reply Last reply
                  0
                  • P PIEBALDconsult

                    Yes, and if that's a concern, just use a DataReader, which I do quite often.

                    T Offline
                    T Offline
                    Thomas ST
                    wrote on last edited by
                    #9

                    Forgot to mention it, but yes, it's a big concern. I'm extracting almost 120GB of files stored in aprox. 400000 BLOBs. :) But thanks for suggestions :)

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups