Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Question on performance...

Question on performance...

Scheduled Pinned Locked Moved The Lounge
helpdatabasecloudquestionjavascript
44 Posts 24 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Sander RosselS Offline
    Sander RosselS Offline
    Sander Rossel
    wrote on last edited by
    #1

    Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

    Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

    D H M D R 18 Replies Last reply
    0
    • Sander RosselS Sander Rossel

      Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

      D Offline
      D Offline
      David ONeil
      wrote on last edited by
      #2

      Sander Rossel wrote:

      Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time.

      Would dumping it into a blank temp table, then doing the work from there after it is all in make things better? I would not be concerned about 5 to 10 (or even more) seconds for something that takes a lot of background checking.

      Our Forgotten Astronomy | Object Oriented Programming with C++ | Wordle solver

      Sander RosselS 1 Reply Last reply
      0
      • Sander RosselS Sander Rossel

        Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #3

        5 seconds is not terrible. If you really wanted to make it faster, you could try inserting it into a smaller table, and then merge those new records into the existing table with a background job?

        To err is human. Fortune favors the monsters.

        Sander RosselS 1 Reply Last reply
        0
        • Sander RosselS Sander Rossel

          Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

          Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

          M Offline
          M Offline
          Mycroft Holmes
          wrote on last edited by
          #4

          Whenever dealing with a fragile system (ie the web, international WAN) I would eliminate it first. As others have said load the data into a temp table just to get it off the web, process the crap out of it and then dump it into the destination table in its final form.

          Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP

          1 Reply Last reply
          0
          • D David ONeil

            Sander Rossel wrote:

            Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time.

            Would dumping it into a blank temp table, then doing the work from there after it is all in make things better? I would not be concerned about 5 to 10 (or even more) seconds for something that takes a lot of background checking.

            Our Forgotten Astronomy | Object Oriented Programming with C++ | Wordle solver

            Sander RosselS Offline
            Sander RosselS Offline
            Sander Rossel
            wrote on last edited by
            #5

            David O'Neil wrote:

            Would dumping it into a blank temp table, then doing the work from there after it is all in make things better?

            Yeah, something like that would be my "instant" solution. The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.

            Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

            P B 2 Replies Last reply
            0
            • H honey the codewitch

              5 seconds is not terrible. If you really wanted to make it faster, you could try inserting it into a smaller table, and then merge those new records into the existing table with a background job?

              To err is human. Fortune favors the monsters.

              Sander RosselS Offline
              Sander RosselS Offline
              Sander Rossel
              wrote on last edited by
              #6

              honey the codewitch wrote:

              If you really wanted to make it faster, you could try inserting it into a smaller table, and then merge those new records into the existing table with a background job?

              I'd rather insert the raw data and handle everything else in a background job. The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case. I'm not a fan of having the same table twice and using the one as some sort of staging for the other (if that's what you're suggesting).

              Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

              1 Reply Last reply
              0
              • Sander RosselS Sander Rossel

                Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                D Offline
                D Offline
                Daniel Pfeffer
                wrote on last edited by
                #7

                My feeling is that the database schema needs redesign - specifically, adding indexes and keys. If you have an field which is supposed to be unique, define it as a unique key in the database table. The database will then ensure that the field is unique in the table, with no extra work required. Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes. IOW, let the database engine do what it's best at. Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                J Sander RosselS 2 Replies Last reply
                0
                • D Daniel Pfeffer

                  My feeling is that the database schema needs redesign - specifically, adding indexes and keys. If you have an field which is supposed to be unique, define it as a unique key in the database table. The database will then ensure that the field is unique in the table, with no extra work required. Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes. IOW, let the database engine do what it's best at. Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                  J Offline
                  J Offline
                  jmaida
                  wrote on last edited by
                  #8

                  Ditto to Dan's words. When you do the redesign consider having some sort of self-balancing binary tree as records are added to keep search times fast.

                  "A little time, a little trouble, your better day" Badfinger

                  D 1 Reply Last reply
                  0
                  • J jmaida

                    Ditto to Dan's words. When you do the redesign consider having some sort of self-balancing binary tree as records are added to keep search times fast.

                    "A little time, a little trouble, your better day" Badfinger

                    D Offline
                    D Offline
                    Daniel Pfeffer
                    wrote on last edited by
                    #9

                    jmaida wrote:

                    some sort of self-balancing binary tree as records are added

                    That's the database engine's worry. Another thing that would help maintain the database's consistency would be [normalizing the database](https://en.wikipedia.org/wiki/Database\_normalization). This uses separate tables to (ideally) ensure that all data is stored only once, using unique keys to link instances. For example, you would have a table containing {country id, country name}, and any other table needing to store a country name would refer to it by the country id. The results of a query must be constructed by reading from multiple tables. Note that this redesign can be complex, and can in some cases lead to slower retrieval.

                    Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                    1 Reply Last reply
                    0
                    • Sander RosselS Sander Rossel

                      Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                      R Offline
                      R Offline
                      RickZeeland
                      wrote on last edited by
                      #10

                      Maybe you can try this SQL index management tool from CodeProject: SQL Index Manager – Free GUI Tool for Index Maintenance on SQL Server and Azure[^]

                      Sander RosselS 1 Reply Last reply
                      0
                      • D Daniel Pfeffer

                        My feeling is that the database schema needs redesign - specifically, adding indexes and keys. If you have an field which is supposed to be unique, define it as a unique key in the database table. The database will then ensure that the field is unique in the table, with no extra work required. Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes. IOW, let the database engine do what it's best at. Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                        Sander RosselS Offline
                        Sander RosselS Offline
                        Sander Rossel
                        wrote on last edited by
                        #11

                        Daniel Pfeffer wrote:

                        Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes.

                        Bit of mansplaining there ;) But yeah, I actually checked my indexes and found one with a very high impact on performance. Removed an unused one too. I'm not going to add a unique key, as there are around 20 fields that, together, should be unique. I feel like that would hurt performance more than it would add to it. That's why I'm checking that manually, because I can search for potentially double data on a specific index, which I need to do anyway to let the user know they're uploading double data (which is now actually allowed, since last month). Removed some data analysis at this step too. I'm storing some redundant data so I don't have to analyze it later. Turns out, with a well placed index, the analysis is instant later on.

                        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                        D M 2 Replies Last reply
                        0
                        • R RickZeeland

                          Maybe you can try this SQL index management tool from CodeProject: SQL Index Manager – Free GUI Tool for Index Maintenance on SQL Server and Azure[^]

                          Sander RosselS Offline
                          Sander RosselS Offline
                          Sander Rossel
                          wrote on last edited by
                          #12

                          That's impressive :omg:

                          Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                          R 1 Reply Last reply
                          0
                          • Sander RosselS Sander Rossel

                            That's impressive :omg:

                            Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                            R Offline
                            R Offline
                            RickZeeland
                            wrote on last edited by
                            #13

                            Never tried it, as we don't use SQL Server anymore, but it looks very promising :-\

                            1 Reply Last reply
                            0
                            • Sander RosselS Sander Rossel

                              Daniel Pfeffer wrote:

                              Indexes are used to maintain the sort order of the records in a table. Searching an index is much faster than searching an unindexed table, and best of all - multiple search criteria may be stored as multiple indexes.

                              Bit of mansplaining there ;) But yeah, I actually checked my indexes and found one with a very high impact on performance. Removed an unused one too. I'm not going to add a unique key, as there are around 20 fields that, together, should be unique. I feel like that would hurt performance more than it would add to it. That's why I'm checking that manually, because I can search for potentially double data on a specific index, which I need to do anyway to let the user know they're uploading double data (which is now actually allowed, since last month). Removed some data analysis at this step too. I'm storing some redundant data so I don't have to analyze it later. Turns out, with a well placed index, the analysis is instant later on.

                              Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                              D Offline
                              D Offline
                              Daniel Pfeffer
                              wrote on last edited by
                              #14

                              Sander Rossel wrote:

                              Bit of mansplaining there ;)

                              Please accept my apologies. I know that you are a developer of some years experience, but you pressed my "teaching" button, and I tend to over-explain at times. :)

                              Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                              J 1 Reply Last reply
                              0
                              • Sander RosselS Sander Rossel

                                Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                C Offline
                                C Offline
                                charlieg
                                wrote on last edited by
                                #15

                                I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources? Since you have a lot of error handling per your own admission ;P I'd think you could add some logging into the mix to see where you are spending your time. But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced. Off the cuff thoughts.

                                Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

                                Sander RosselS 1 Reply Last reply
                                0
                                • Sander RosselS Sander Rossel

                                  Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                  Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                  L Offline
                                  L Offline
                                  Lost User
                                  wrote on last edited by
                                  #16

                                  Inspired by this question, I went to take a more detailed look at where the time to load a 1.2 million line, 0.5 gigabyte, CSV file goes in a particular application that I'm working on. That process takes about 5 seconds too, by some coincidence. There are three steps: reading the file (all at once), converting the encoding, and parsing the CSV into objects. - reading the file (File.ReadAllBytes) takes about 150 milliseconds. (so we're reading the file at 3.3GB/s, which is reasonable, near the maximum sequential read speed of the SSD) - converting the encoding (aka Encoding.GetString) takes about 900 milliseconds. (at 5GHz, this comes down to 9 cycles per byte, should it take 9 cycles per byte to do encoding conversion? sounds slow to me) - parsing / creating objects takes the rest of the time. (this code is crap and takes at least 10 times as long as it should) There's no database involved. Loading 2k rows in the same time basically comes down to saying that using a database makes loading the data over 500 times as slow - compared to code that is slow already. Is that reasonable? Databases are infamous for being pure molasses, but that sounds extreme. On the other hand, it's only 5 seconds, not a big deal.

                                  Sander RosselS 1 Reply Last reply
                                  0
                                  • C charlieg

                                    I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources? Since you have a lot of error handling per your own admission ;P I'd think you could add some logging into the mix to see where you are spending your time. But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced. Off the cuff thoughts.

                                    Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

                                    Sander RosselS Offline
                                    Sander RosselS Offline
                                    Sander Rossel
                                    wrote on last edited by
                                    #17

                                    charlieg wrote:

                                    these values are text? numeric?

                                    Text, numbers, decimals, dates.

                                    charlieg wrote:

                                    Just how large is the overall file?

                                    I have an initial file of 248 KB.

                                    charlieg wrote:

                                    What's the connection speed between the web site and the azure based system?

                                    They both run in Azure in the same region, so I suspect the connection is fast.

                                    charlieg wrote:

                                    Does your azure system have sufficient resources?

                                    Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.

                                    charlieg wrote:

                                    see where you are spending your time.

                                    Inserting 2500 lines. await context.BulkInsertAsync(lines); is the exact line ;) Although I suspect there may be some other long running queries in production. Possibly getting all the lines for a specific date. I'm looking at my new code now (and added an index) and the inserting takes the longest by far (a rewrite of this functionality was necessary for other reasons too, I just gave it more priority because of this).

                                    charlieg wrote:

                                    Solve the timeout issue, and I'd give it 50/50 your performance issues go away.

                                    The timeout issue is the performance issue :laugh:

                                    Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                    J I 2 Replies Last reply
                                    0
                                    • L Lost User

                                      Inspired by this question, I went to take a more detailed look at where the time to load a 1.2 million line, 0.5 gigabyte, CSV file goes in a particular application that I'm working on. That process takes about 5 seconds too, by some coincidence. There are three steps: reading the file (all at once), converting the encoding, and parsing the CSV into objects. - reading the file (File.ReadAllBytes) takes about 150 milliseconds. (so we're reading the file at 3.3GB/s, which is reasonable, near the maximum sequential read speed of the SSD) - converting the encoding (aka Encoding.GetString) takes about 900 milliseconds. (at 5GHz, this comes down to 9 cycles per byte, should it take 9 cycles per byte to do encoding conversion? sounds slow to me) - parsing / creating objects takes the rest of the time. (this code is crap and takes at least 10 times as long as it should) There's no database involved. Loading 2k rows in the same time basically comes down to saying that using a database makes loading the data over 500 times as slow - compared to code that is slow already. Is that reasonable? Databases are infamous for being pure molasses, but that sounds extreme. On the other hand, it's only 5 seconds, not a big deal.

                                      Sander RosselS Offline
                                      Sander RosselS Offline
                                      Sander Rossel
                                      wrote on last edited by
                                      #18

                                      harold aptroot wrote:

                                      load a 1.2 million line, 0.5 gigabyte, CSV file

                                      :omg:

                                      harold aptroot wrote:

                                      Databases are infamous for being pure molasses, but that sounds extreme.

                                      Well, at 2500 lines and 40 values per line, it comes down to storing 100,000 values in the correct place. Storing the CSV directly takes a few 100 milliseconds. The database actions are the only slow lines of code here. Reading the CSV and putting it to objects is milliseconds (and that's using a general CSV parser that uses reflection).

                                      harold aptroot wrote:

                                      using a database makes loading the data over 500 times as slow

                                      I guess that sounds about right. Depending on how fast your DB server is, of course ;)

                                      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                      1 Reply Last reply
                                      0
                                      • Sander RosselS Sander Rossel

                                        Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                        M Offline
                                        M Offline
                                        Marc Clifton
                                        wrote on last edited by
                                        #19

                                        What David said. I have similar requirements for updating a huge table (or set of tables) with a relatively small CSV dataset. I put the CSV into a separate table, do the data validation (like duplicate checks) with SQL and indexed columns on the criteria for duplicates, and perform the resulting updates/inserts in C# and email the customer with the change log. Very very fast.

                                        Latest Article:
                                        Create a Digital Ocean Droplet for .NET Core Web API with a real SSL Certificate on a Domain

                                        Sander RosselS 1 Reply Last reply
                                        0
                                        • Sander RosselS Sander Rossel

                                          David O'Neil wrote:

                                          Would dumping it into a blank temp table, then doing the work from there after it is all in make things better?

                                          Yeah, something like that would be my "instant" solution. The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.

                                          Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                          P Offline
                                          P Offline
                                          pemmons76013
                                          wrote on last edited by
                                          #20

                                          I wouldn't waste another minute on this. If one of my DABs was working on it I would redirect them. Find something that run 10,000 a day.

                                          Sander RosselS 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups