Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Faster way of filling a Dataset

Faster way of filling a Dataset

Scheduled Pinned Locked Moved C#
databasequestionpostgresqlperformancehelp
11 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    MumbleB
    wrote on last edited by
    #1

    Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.

            string sql = @"SELECT \* FROM datbase where bus\_received\_time\_stamp between @startDate AND @endDate";
    
            //Making use of a Postgresql Connection helper
            NpgsqlConnection conn = new NpgsqlConnection(conns);
    
            //Here we format the dateTimePicker dates to be used with the database
            string fromDate = dtStartDate.Text;
            string toDate = dtEndDate.Text;
    
            //Instanciate a SqlCommand and connect to the DB and fill a Dataset
            NpgsqlCommand cmd = new NpgsqlCommand();
            cmd.Connection = conn;
            cmd.CommandText = sql;
            cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
            cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");
    
            #endregion
    
            conversionRate = txtConversionRate.Text;
            double conRate = Convert.ToDouble(conversionRate);
    
            try
            {
                //Open the DB Connection
                conn.Open();
                setText(this, "Connection Established");
                NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
                setText(this, "Collecting Data for Processing!");
                da.Fill(dts);
    

    //Time between the two "setText statements takes a good 4 to 5 minutes
    setText(this, "Define Data To be Used");

    From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct? Any ideas??

    Excellence is doing ordinary things extraordinarily well.

    J S P 3 Replies Last reply
    0
    • M MumbleB

      Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.

              string sql = @"SELECT \* FROM datbase where bus\_received\_time\_stamp between @startDate AND @endDate";
      
              //Making use of a Postgresql Connection helper
              NpgsqlConnection conn = new NpgsqlConnection(conns);
      
              //Here we format the dateTimePicker dates to be used with the database
              string fromDate = dtStartDate.Text;
              string toDate = dtEndDate.Text;
      
              //Instanciate a SqlCommand and connect to the DB and fill a Dataset
              NpgsqlCommand cmd = new NpgsqlCommand();
              cmd.Connection = conn;
              cmd.CommandText = sql;
              cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
              cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");
      
              #endregion
      
              conversionRate = txtConversionRate.Text;
              double conRate = Convert.ToDouble(conversionRate);
      
              try
              {
                  //Open the DB Connection
                  conn.Open();
                  setText(this, "Connection Established");
                  NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
                  setText(this, "Collecting Data for Processing!");
                  da.Fill(dts);
      

      //Time between the two "setText statements takes a good 4 to 5 minutes
      setText(this, "Define Data To be Used");

      From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct? Any ideas??

      Excellence is doing ordinary things extraordinarily well.

      J Offline
      J Offline
      Jasmine2501
      wrote on last edited by
      #2

      Sounds like you are retrieving a huge dataset and doing calculations in your client code? If these statistics or calculations can be done on the SQL server, it will likely increase performance (because sql servers are really good at aggregating stats) and it will decrease network load, which I suspect is your problem in the first place. Getting 1.4 million rows from the database will always take a long time because it has to transfer all that data over the network. If you only had one boolean column in your table, 1.4 million rows would be 1.3MB of data - your 20 columns could result in your "fill the dataset" being a 20MB download, and that's never gonna be real fast. To recap - solutions involve reducing the amount of data you need to transfer. Ideally, only transfer exactly what needs to be seen by the user and nothing more. Even if your query is super fast, transferring data over the network isn't. Your index on the date column is about all you can do to speed up the query since it's your only selection criteria, but again, finding the data isn't probably the issue.

      1 Reply Last reply
      0
      • M MumbleB

        Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.

                string sql = @"SELECT \* FROM datbase where bus\_received\_time\_stamp between @startDate AND @endDate";
        
                //Making use of a Postgresql Connection helper
                NpgsqlConnection conn = new NpgsqlConnection(conns);
        
                //Here we format the dateTimePicker dates to be used with the database
                string fromDate = dtStartDate.Text;
                string toDate = dtEndDate.Text;
        
                //Instanciate a SqlCommand and connect to the DB and fill a Dataset
                NpgsqlCommand cmd = new NpgsqlCommand();
                cmd.Connection = conn;
                cmd.CommandText = sql;
                cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
                cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");
        
                #endregion
        
                conversionRate = txtConversionRate.Text;
                double conRate = Convert.ToDouble(conversionRate);
        
                try
                {
                    //Open the DB Connection
                    conn.Open();
                    setText(this, "Connection Established");
                    NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
                    setText(this, "Collecting Data for Processing!");
                    da.Fill(dts);
        

        //Time between the two "setText statements takes a good 4 to 5 minutes
        setText(this, "Define Data To be Used");

        From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct? Any ideas??

        Excellence is doing ordinary things extraordinarily well.

        S Offline
        S Offline
        SledgeHammer01
        wrote on last edited by
        #3

        Filling the dataset is always going to be the bottleneck in your code. That being said... A few "band-aid" fixes: 1) Your query is pretty basic, but you *might* shave off a bit of time by using a stored procedure instead. 2) You *might* shave off a bit of time by indexing the bus_received_time_stamp column 3) Do you have an enterprise grade SQL server? Or are you running it on some re-purposed podunk PC? 4) Are you connecting to the server via gigabit ethernet? With those band-aid fixes out of the way...: 1) The real cause of your problem is that you are asking for a LOT of data... 1.4M rows x 20 columns = 28M "cells"... if each cell is only 1 byte, thats 28M bytes = 26MB of data you are asking for. Most likely your average cell size could be 10 bytes or even 100 bytes. Now you are asking for 260MB to 2.6GB of data. 2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data. 3) Are you doing some kind of calculation on the data? If so, you can likely do that on the server and then only retrieve the result... or you can set it to run as a job at midnight or something and have it cached... 4) You could also page the data if the user really needs all that data which again I find **highly** doubtful. You really need to define your problem better.

        L M 2 Replies Last reply
        0
        • S SledgeHammer01

          Filling the dataset is always going to be the bottleneck in your code. That being said... A few "band-aid" fixes: 1) Your query is pretty basic, but you *might* shave off a bit of time by using a stored procedure instead. 2) You *might* shave off a bit of time by indexing the bus_received_time_stamp column 3) Do you have an enterprise grade SQL server? Or are you running it on some re-purposed podunk PC? 4) Are you connecting to the server via gigabit ethernet? With those band-aid fixes out of the way...: 1) The real cause of your problem is that you are asking for a LOT of data... 1.4M rows x 20 columns = 28M "cells"... if each cell is only 1 byte, thats 28M bytes = 26MB of data you are asking for. Most likely your average cell size could be 10 bytes or even 100 bytes. Now you are asking for 260MB to 2.6GB of data. 2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data. 3) Are you doing some kind of calculation on the data? If so, you can likely do that on the server and then only retrieve the result... or you can set it to run as a job at midnight or something and have it cached... 4) You could also page the data if the user really needs all that data which again I find **highly** doubtful. You really need to define your problem better.

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          SledgeHammer01 wrote:

          1. Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.

          Post said it's for a report; but good point :)

          Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

          S 1 Reply Last reply
          0
          • M MumbleB

            Hi Guys. Suppose this question has been asked numerous times. I have a database "postgresql" with about 20 columns and about 1.4million records. I select data from the database between two given date ranges and it takes about 4 minutes to fill the dataset. I would like to speed this up. I am doing no updates, just using the data to compile a report. is there a faster way to do this? Below code is what I have. My results are OK but just filling the dataset is a bit of a problem i that it takes +- 4 to 5 minutes to fill the dataset.

                    string sql = @"SELECT \* FROM datbase where bus\_received\_time\_stamp between @startDate AND @endDate";
            
                    //Making use of a Postgresql Connection helper
                    NpgsqlConnection conn = new NpgsqlConnection(conns);
            
                    //Here we format the dateTimePicker dates to be used with the database
                    string fromDate = dtStartDate.Text;
                    string toDate = dtEndDate.Text;
            
                    //Instanciate a SqlCommand and connect to the DB and fill a Dataset
                    NpgsqlCommand cmd = new NpgsqlCommand();
                    cmd.Connection = conn;
                    cmd.CommandText = sql;
                    cmd.Parameters.AddWithValue("@startDate", fromDate + " 00:00:01");
                    cmd.Parameters.AddWithValue("@endDate", toDate + " 23:59:59");
            
                    #endregion
            
                    conversionRate = txtConversionRate.Text;
                    double conRate = Convert.ToDouble(conversionRate);
            
                    try
                    {
                        //Open the DB Connection
                        conn.Open();
                        setText(this, "Connection Established");
                        NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
                        setText(this, "Collecting Data for Processing!");
                        da.Fill(dts);
            

            //Time between the two "setText statements takes a good 4 to 5 minutes
            setText(this, "Define Data To be Used");

            From here the processing takes a few minutes as there quite a bit of matching to do etc. However the main thing here is that filling the DataAdapter takes too much time. I would like to cut this down to maybe a few seconds, maybe a minute? I have added an Index on the DB on the Datee column to try and speed things up but now sure if this is correct? Any ideas??

            Excellence is doing ordinary things extraordinarily well.

            P Offline
            P Offline
            Pete OHanlon
            wrote on last edited by
            #5

            As has been said, that's a lot of data. Do you really need all that data though? If this is for a report, do you really need all 20 columns? If you reduce what you're retrieving, this should help speed things up.

            I was brought up to respect my elders. I don't respect many people nowadays.
            CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

            M 1 Reply Last reply
            0
            • L Lost User

              SledgeHammer01 wrote:

              1. Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data.

              Post said it's for a report; but good point :)

              Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

              S Offline
              S Offline
              SledgeHammer01
              wrote on last edited by
              #6

              If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it. If he needs all that data to generate a usable report, thats another story. In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

              L 1 Reply Last reply
              0
              • S SledgeHammer01

                If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it. If he needs all that data to generate a usable report, thats another story. In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                SledgeHammer01 wrote:

                If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it.

                I'd do that with any report; then again, sometimes some bureaucrat needs a paper-trail from here to the moon, just to be able to say to his boss that the evidence has been archived. Yes, I'll come up with the "filter, don't load all"-text, but that's more appropriate for grids/displaying than for reporting. For reporting, I'd state that time matters little.

                SledgeHammer01 wrote:

                If he needs all that data to generate a usable report, thats another story.

                He'll be the one to answer that question, and eventually, you'd still end up writing that report. Keyword here is "needs".

                SledgeHammer01 wrote:

                In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

                It would still need databinding, whether you do it on the client or on the server. I'd suggest doing so on the client; I'd hate to see fifty people abusing the server while they have an idle desktop.

                Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

                S 1 Reply Last reply
                0
                • L Lost User

                  SledgeHammer01 wrote:

                  If someone plopped a report with 1.4M rows on it on your desk, what are you gonna do with it? I'd toss it in the recycle bin without even looking at it.

                  I'd do that with any report; then again, sometimes some bureaucrat needs a paper-trail from here to the moon, just to be able to say to his boss that the evidence has been archived. Yes, I'll come up with the "filter, don't load all"-text, but that's more appropriate for grids/displaying than for reporting. For reporting, I'd state that time matters little.

                  SledgeHammer01 wrote:

                  If he needs all that data to generate a usable report, thats another story.

                  He'll be the one to answer that question, and eventually, you'd still end up writing that report. Keyword here is "needs".

                  SledgeHammer01 wrote:

                  In that case, he should probably do what another poster suggested and run the stats on the server instead of downloading all the data and doing it on the client.

                  It would still need databinding, whether you do it on the client or on the server. I'd suggest doing so on the client; I'd hate to see fifty people abusing the server while they have an idle desktop.

                  Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

                  S Offline
                  S Offline
                  SledgeHammer01
                  wrote on last edited by
                  #8

                  All depends on his requirements. I'm kind of skeptical he needs to generate *real-time* reports for such massive amounts of data. In that case, I'd have a scheduled job that runs on the server at 12:01am and generates the report for the previous day and caches it somewhere. If he needs to generate real-time reports for an arbitrary date range, yeah... there isn't really a way around that. It's going to take 4 to 5 minutes. It would still be a clever design to have a service that runs at 12:01am and pulls the data for the previous day and caches it somewhere. Perhaps in compressed form :) since you can't compress data over the SQL protocol (well, you can with 3rd party software I guess). Then his client or whatever knows to pull 05202013.zip, 05212013.zip, 05222013.zip and concatenate them for those 3 days for example. You'd probably shrink down the data from 2.6GB to less then a gig. You'd have to test it out of course :)... the time you save on pulling the zips might be balanced out by the time it takes to unzip + concat + load into datasets. Then again, depending on his application, the nightly service might be able to do some preprocessing steps as well.

                  L 1 Reply Last reply
                  0
                  • S SledgeHammer01

                    All depends on his requirements. I'm kind of skeptical he needs to generate *real-time* reports for such massive amounts of data. In that case, I'd have a scheduled job that runs on the server at 12:01am and generates the report for the previous day and caches it somewhere. If he needs to generate real-time reports for an arbitrary date range, yeah... there isn't really a way around that. It's going to take 4 to 5 minutes. It would still be a clever design to have a service that runs at 12:01am and pulls the data for the previous day and caches it somewhere. Perhaps in compressed form :) since you can't compress data over the SQL protocol (well, you can with 3rd party software I guess). Then his client or whatever knows to pull 05202013.zip, 05212013.zip, 05222013.zip and concatenate them for those 3 days for example. You'd probably shrink down the data from 2.6GB to less then a gig. You'd have to test it out of course :)... the time you save on pulling the zips might be balanced out by the time it takes to unzip + concat + load into datasets. Then again, depending on his application, the nightly service might be able to do some preprocessing steps as well.

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    Good answer :thumbsup:

                    1 Reply Last reply
                    0
                    • P Pete OHanlon

                      As has been said, that's a lot of data. Do you really need all that data though? If this is for a report, do you really need all 20 columns? If you reduce what you're retrieving, this should help speed things up.

                      I was brought up to respect my elders. I don't respect many people nowadays.
                      CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

                      M Offline
                      M Offline
                      MumbleB
                      wrote on last edited by
                      #10

                      Thanks Pete and all the other guys. I need all the data to generate a report, summary billing report. SO client-side calculations etc is a must. What I have found helpful was creating a stored procedure on the DB, it is a local DB not a network DB. Concept phase before business would through money at a server. Did a heck of a lot of reading up on PostGres as this is my DB. For now I am sitting at 2 million records and it runs fine. Takes about 2 minutes to collect and do all relevant calculations etc. Indexing the date column helped with performance as well as I use this to create the report for a specific date range. Just for interest sake, 2 million records is only one months worth of data, which is what I am extracting from an Oracle Database which carries way more data than what I am carrying to do the billing. So, in short, it works for now. Will see as the local DB grows how the performance degrades.

                      Excellence is doing ordinary things extraordinarily well.

                      1 Reply Last reply
                      0
                      • S SledgeHammer01

                        Filling the dataset is always going to be the bottleneck in your code. That being said... A few "band-aid" fixes: 1) Your query is pretty basic, but you *might* shave off a bit of time by using a stored procedure instead. 2) You *might* shave off a bit of time by indexing the bus_received_time_stamp column 3) Do you have an enterprise grade SQL server? Or are you running it on some re-purposed podunk PC? 4) Are you connecting to the server via gigabit ethernet? With those band-aid fixes out of the way...: 1) The real cause of your problem is that you are asking for a LOT of data... 1.4M rows x 20 columns = 28M "cells"... if each cell is only 1 byte, thats 28M bytes = 26MB of data you are asking for. Most likely your average cell size could be 10 bytes or even 100 bytes. Now you are asking for 260MB to 2.6GB of data. 2) Its ***highly*** doubtful you need all that data. What are you doing with it? Displaying it to a user? That's great... but whats a person going to do with 1.4M rows of data? Nothing... a human can't process that amount of data. 3) Are you doing some kind of calculation on the data? If so, you can likely do that on the server and then only retrieve the result... or you can set it to run as a job at midnight or something and have it cached... 4) You could also page the data if the user really needs all that data which again I find **highly** doubtful. You really need to define your problem better.

                        M Offline
                        M Offline
                        MumbleB
                        wrote on last edited by
                        #11

                        Thanks Sledge. I did all the above before returning to the thread and after doing a whole lot of background work on Postgres Databases etc. Stored procedure definitely helped and then indexing did as well. I am in the process of looking at using a refcurs as well. But the proof will be in the pudding on Sunday when I have to run the billing for the month of May. Just over 2 million records now and it takes around 2, 3 or 4 minutes to complete everything, calculations included and creating PDF output reports.

                        Excellence is doing ordinary things extraordinarily well.

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups