Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Question on performance...

Question on performance...

Scheduled Pinned Locked Moved The Lounge
helpdatabasecloudquestionjavascript
44 Posts 24 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Sander RosselS Sander Rossel

    Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

    Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

    C Offline
    C Offline
    charlieg
    wrote on last edited by
    #15

    I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources? Since you have a lot of error handling per your own admission ;P I'd think you could add some logging into the mix to see where you are spending your time. But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced. Off the cuff thoughts.

    Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

    Sander RosselS 1 Reply Last reply
    0
    • Sander RosselS Sander Rossel

      Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #16

      Inspired by this question, I went to take a more detailed look at where the time to load a 1.2 million line, 0.5 gigabyte, CSV file goes in a particular application that I'm working on. That process takes about 5 seconds too, by some coincidence. There are three steps: reading the file (all at once), converting the encoding, and parsing the CSV into objects. - reading the file (File.ReadAllBytes) takes about 150 milliseconds. (so we're reading the file at 3.3GB/s, which is reasonable, near the maximum sequential read speed of the SSD) - converting the encoding (aka Encoding.GetString) takes about 900 milliseconds. (at 5GHz, this comes down to 9 cycles per byte, should it take 9 cycles per byte to do encoding conversion? sounds slow to me) - parsing / creating objects takes the rest of the time. (this code is crap and takes at least 10 times as long as it should) There's no database involved. Loading 2k rows in the same time basically comes down to saying that using a database makes loading the data over 500 times as slow - compared to code that is slow already. Is that reasonable? Databases are infamous for being pure molasses, but that sounds extreme. On the other hand, it's only 5 seconds, not a big deal.

      Sander RosselS 1 Reply Last reply
      0
      • C charlieg

        I'd like to know more technical details. Assuming 2500 lines, 40 values per line, these values are text? numeric? Just how large is the overall file? What's the connection speed between the web site and the azure based system? Does your azure system have sufficient resources? Since you have a lot of error handling per your own admission ;P I'd think you could add some logging into the mix to see where you are spending your time. But this: "Last week, it took 17 tries to import one file" is a smoking gun. Solve the timeout issue, and I'd give it 50/50 your performance issues go away. You're not moving that much data. One other suggestion, double the resources (temporarily) for the azure system to make sure it's not under-resourced. Off the cuff thoughts.

        Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

        Sander RosselS Offline
        Sander RosselS Offline
        Sander Rossel
        wrote on last edited by
        #17

        charlieg wrote:

        these values are text? numeric?

        Text, numbers, decimals, dates.

        charlieg wrote:

        Just how large is the overall file?

        I have an initial file of 248 KB.

        charlieg wrote:

        What's the connection speed between the web site and the azure based system?

        They both run in Azure in the same region, so I suspect the connection is fast.

        charlieg wrote:

        Does your azure system have sufficient resources?

        Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.

        charlieg wrote:

        see where you are spending your time.

        Inserting 2500 lines. await context.BulkInsertAsync(lines); is the exact line ;) Although I suspect there may be some other long running queries in production. Possibly getting all the lines for a specific date. I'm looking at my new code now (and added an index) and the inserting takes the longest by far (a rewrite of this functionality was necessary for other reasons too, I just gave it more priority because of this).

        charlieg wrote:

        Solve the timeout issue, and I'd give it 50/50 your performance issues go away.

        The timeout issue is the performance issue :laugh:

        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

        J I 2 Replies Last reply
        0
        • L Lost User

          Inspired by this question, I went to take a more detailed look at where the time to load a 1.2 million line, 0.5 gigabyte, CSV file goes in a particular application that I'm working on. That process takes about 5 seconds too, by some coincidence. There are three steps: reading the file (all at once), converting the encoding, and parsing the CSV into objects. - reading the file (File.ReadAllBytes) takes about 150 milliseconds. (so we're reading the file at 3.3GB/s, which is reasonable, near the maximum sequential read speed of the SSD) - converting the encoding (aka Encoding.GetString) takes about 900 milliseconds. (at 5GHz, this comes down to 9 cycles per byte, should it take 9 cycles per byte to do encoding conversion? sounds slow to me) - parsing / creating objects takes the rest of the time. (this code is crap and takes at least 10 times as long as it should) There's no database involved. Loading 2k rows in the same time basically comes down to saying that using a database makes loading the data over 500 times as slow - compared to code that is slow already. Is that reasonable? Databases are infamous for being pure molasses, but that sounds extreme. On the other hand, it's only 5 seconds, not a big deal.

          Sander RosselS Offline
          Sander RosselS Offline
          Sander Rossel
          wrote on last edited by
          #18

          harold aptroot wrote:

          load a 1.2 million line, 0.5 gigabyte, CSV file

          :omg:

          harold aptroot wrote:

          Databases are infamous for being pure molasses, but that sounds extreme.

          Well, at 2500 lines and 40 values per line, it comes down to storing 100,000 values in the correct place. Storing the CSV directly takes a few 100 milliseconds. The database actions are the only slow lines of code here. Reading the CSV and putting it to objects is milliseconds (and that's using a general CSV parser that uses reflection).

          harold aptroot wrote:

          using a database makes loading the data over 500 times as slow

          I guess that sounds about right. Depending on how fast your DB server is, of course ;)

          Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

          1 Reply Last reply
          0
          • Sander RosselS Sander Rossel

            Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

            Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

            M Offline
            M Offline
            Marc Clifton
            wrote on last edited by
            #19

            What David said. I have similar requirements for updating a huge table (or set of tables) with a relatively small CSV dataset. I put the CSV into a separate table, do the data validation (like duplicate checks) with SQL and indexed columns on the criteria for duplicates, and perform the resulting updates/inserts in C# and email the customer with the change log. Very very fast.

            Latest Article:
            Create a Digital Ocean Droplet for .NET Core Web API with a real SSL Certificate on a Domain

            Sander RosselS 1 Reply Last reply
            0
            • Sander RosselS Sander Rossel

              David O'Neil wrote:

              Would dumping it into a blank temp table, then doing the work from there after it is all in make things better?

              Yeah, something like that would be my "instant" solution. The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.

              Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

              P Offline
              P Offline
              pemmons76013
              wrote on last edited by
              #20

              I wouldn't waste another minute on this. If one of my DABs was working on it I would redirect them. Find something that run 10,000 a day.

              Sander RosselS 1 Reply Last reply
              0
              • P pemmons76013

                I wouldn't waste another minute on this. If one of my DABs was working on it I would redirect them. Find something that run 10,000 a day.

                Sander RosselS Offline
                Sander RosselS Offline
                Sander Rossel
                wrote on last edited by
                #21

                And if you were me, I'd be out of a job for such short-sightedness. That they don't use it much does not mean it's not important. One CSV can result in 16 invoices, no invoices no money... Now 16 invoices, five times a day, equals 80 invoices and 80 invoices is enough to keep you busy for the day. Uploading five files should take about five minutes, but right now it takes them half an hour or even longer, and that means other people are waiting too... So yeah, I am going to "waste" minutes on this vital task that they perform every day and which doesn't work most of the time! In fact, the client called me twice to tell me this is a top priority and should be fixed ASAP.

                Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                C 1 Reply Last reply
                0
                • Sander RosselS Sander Rossel

                  Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                  Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                  R Offline
                  R Offline
                  raddevus
                  wrote on last edited by
                  #22

                  Your answer is right here on CP: A Fast CSV Reader[^] That article is not written by me, but I've used this component since sometime way back in 2008 or so. It has been updated along the way. That is a fantastic and amazingly fast (see perf stats in the article). It's super easy to use and will solve your problems fast. I remember getting it to work in about 15 minutes. Import the component into your project & the API calls are intuitive. Seriously try it and I'm sure it'll make you happy. :-D Let me know what you think. Here's a snippet of the performance from the article:

                  From article:

                  To give more down-to-earth numbers, with a 45 MB CSV file containing 145 fields and 50,000 records, the reader was processing about 30 MB/sec. So all in all, it took 1.5 seconds! The machine specs were P4 3.0 GHz, 1024 MB.

                  Sander RosselS 1 Reply Last reply
                  0
                  • Sander RosselS Sander Rossel

                    Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                    Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                    abmvA Offline
                    abmvA Offline
                    abmv
                    wrote on last edited by
                    #23

                    Have a session when they are doing the process and see why the time out occurs. What changed ? Was it working ok before? Why is it taking 17 tries now? maybe u can do a sql monitor and see the iops/sessions etc....

                    Caveat Emptor. "Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

                    We are in the beginning of a mass extinction. - Greta Thunberg

                    1 Reply Last reply
                    0
                    • R raddevus

                      Your answer is right here on CP: A Fast CSV Reader[^] That article is not written by me, but I've used this component since sometime way back in 2008 or so. It has been updated along the way. That is a fantastic and amazingly fast (see perf stats in the article). It's super easy to use and will solve your problems fast. I remember getting it to work in about 15 minutes. Import the component into your project & the API calls are intuitive. Seriously try it and I'm sure it'll make you happy. :-D Let me know what you think. Here's a snippet of the performance from the article:

                      From article:

                      To give more down-to-earth numbers, with a 45 MB CSV file containing 145 fields and 50,000 records, the reader was processing about 30 MB/sec. So all in all, it took 1.5 seconds! The machine specs were P4 3.0 GHz, 1024 MB.

                      Sander RosselS Offline
                      Sander RosselS Offline
                      Sander Rossel
                      wrote on last edited by
                      #24

                      I'm pretty sure it's a great tool, but it's not the CSV I'm having trouble with. I get all my values in objects in mere milliseconds with my own generic CSV reader. Besides, this project uses .NET Framework 2.0, which is like 15 versions, two complete overhauls and twenty years behind my version of .NET :| My problem is getting all those values in a database in an acceptable time frame (and 30+ seconds is not acceptable :~).

                      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                      1 Reply Last reply
                      0
                      • M Marc Clifton

                        What David said. I have similar requirements for updating a huge table (or set of tables) with a relatively small CSV dataset. I put the CSV into a separate table, do the data validation (like duplicate checks) with SQL and indexed columns on the criteria for duplicates, and perform the resulting updates/inserts in C# and email the customer with the change log. Very very fast.

                        Latest Article:
                        Create a Digital Ocean Droplet for .NET Core Web API with a real SSL Certificate on a Domain

                        Sander RosselS Offline
                        Sander RosselS Offline
                        Sander Rossel
                        wrote on last edited by
                        #25

                        How do you handle validation towards your users? A user imports a file and wants to know why it failed (e.g. some value is invalid or the file was already (partly) imported).

                        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                        M 1 Reply Last reply
                        0
                        • D Daniel Pfeffer

                          Sander Rossel wrote:

                          Bit of mansplaining there ;)

                          Please accept my apologies. I know that you are a developer of some years experience, but you pressed my "teaching" button, and I tend to over-explain at times. :)

                          Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

                          J Offline
                          J Offline
                          Jorgen Andersson
                          wrote on last edited by
                          #26

                          Daniel Pfeffer wrote:

                          and I tend to over-explain at times

                          Better than the opposite, which I tend to do at times.

                          Wrong is evil and must be defeated. - Jeff Ello

                          1 Reply Last reply
                          0
                          • Sander RosselS Sander Rossel

                            charlieg wrote:

                            these values are text? numeric?

                            Text, numbers, decimals, dates.

                            charlieg wrote:

                            Just how large is the overall file?

                            I have an initial file of 248 KB.

                            charlieg wrote:

                            What's the connection speed between the web site and the azure based system?

                            They both run in Azure in the same region, so I suspect the connection is fast.

                            charlieg wrote:

                            Does your azure system have sufficient resources?

                            Yeah, we don't have the fastest database (50 DTUs), but it's plenty sufficient for everything else.

                            charlieg wrote:

                            see where you are spending your time.

                            Inserting 2500 lines. await context.BulkInsertAsync(lines); is the exact line ;) Although I suspect there may be some other long running queries in production. Possibly getting all the lines for a specific date. I'm looking at my new code now (and added an index) and the inserting takes the longest by far (a rewrite of this functionality was necessary for other reasons too, I just gave it more priority because of this).

                            charlieg wrote:

                            Solve the timeout issue, and I'd give it 50/50 your performance issues go away.

                            The timeout issue is the performance issue :laugh:

                            Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                            J Offline
                            J Offline
                            Jorgen Andersson
                            wrote on last edited by
                            #27

                            I suspect your problem is that Entity Framework tries to be "intelligent". My first guess is that turning off AutoDetectChangesEnabled would solve your problem. But seriously, I would skip the whole BulkInsert thingy and go directly to SqlBulkCopy instead.

                            Wrong is evil and must be defeated. - Jeff Ello

                            Sander RosselS 1 Reply Last reply
                            0
                            • J Jorgen Andersson

                              I suspect your problem is that Entity Framework tries to be "intelligent". My first guess is that turning off AutoDetectChangesEnabled would solve your problem. But seriously, I would skip the whole BulkInsert thingy and go directly to SqlBulkCopy instead.

                              Wrong is evil and must be defeated. - Jeff Ello

                              Sander RosselS Offline
                              Sander RosselS Offline
                              Sander Rossel
                              wrote on last edited by
                              #28

                              Jörgen Andersson wrote:

                              I suspect your problem is that Entity Framework tries to be "intelligent".

                              Yeah, using "vanilla" EF takes minutes to insert 2500 rows, so that's not an option. I'm using the EFCore.BulkExtensions library for this one.

                              Jörgen Andersson wrote:

                              and go directly to SqlBulkCopy instead

                              Wouldn't I have to insert before I can copy? :confused:

                              Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                              J 1 Reply Last reply
                              0
                              • Sander RosselS Sander Rossel

                                Jörgen Andersson wrote:

                                I suspect your problem is that Entity Framework tries to be "intelligent".

                                Yeah, using "vanilla" EF takes minutes to insert 2500 rows, so that's not an option. I'm using the EFCore.BulkExtensions library for this one.

                                Jörgen Andersson wrote:

                                and go directly to SqlBulkCopy instead

                                Wouldn't I have to insert before I can copy? :confused:

                                Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                J Offline
                                J Offline
                                Jorgen Andersson
                                wrote on last edited by
                                #29

                                No, you just need to read the CSV-file to an IEnumerable of sorts and connect it to an EntityDataReader that you use as an input to SqlBulkCopy. EntityDatareader is a part of System.Data.EntityClient. Or you can use a CSV-Reader[^] that you connect directly to SqlBulkCopy.

                                Wrong is evil and must be defeated. - Jeff Ello

                                Sander RosselS 1 Reply Last reply
                                0
                                • Sander RosselS Sander Rossel

                                  David O'Neil wrote:

                                  Would dumping it into a blank temp table, then doing the work from there after it is all in make things better?

                                  Yeah, something like that would be my "instant" solution. The UX would be better, but the use case is that they import the file and then immediately work with the transformed data, so they'd be waiting for that in any case.

                                  Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                  B Offline
                                  B Offline
                                  BryanFazekas
                                  wrote on last edited by
                                  #30

                                  You're consistently getting a timeout on the import? That is the problem that needs to be fixed. From your description the upload worked until you added a lot of processing to the upload. Timeout errors can be fixed by changing the environment or changing the process. Since you probably can't change the environment, fix the process by segregating the upload (insert into temp table), and then process the file.

                                  1 Reply Last reply
                                  0
                                  • Sander RosselS Sander Rossel

                                    Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                    Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                    C Offline
                                    C Offline
                                    Cpichols
                                    wrote on last edited by
                                    #31

                                    This is a nightmare scenario:

                                    Quote:

                                    Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records,

                                    I've done this before and it's slow no matter what I do and our db is very optimized. What I wound up doing was pulling and storing in an array, all of the unique fields only. It's one pull (huge, but with only one or two values per record, doable). Then in the double-record validation, I just check each new record against that one array, in advance of any db insert/update work, sorting the new records into two arrays, one for insert, one for update. That way the inserts are quicker. If you are trashing the duplicates, you can skip the update part.

                                    1 Reply Last reply
                                    0
                                    • Sander RosselS Sander Rossel

                                      And if you were me, I'd be out of a job for such short-sightedness. That they don't use it much does not mean it's not important. One CSV can result in 16 invoices, no invoices no money... Now 16 invoices, five times a day, equals 80 invoices and 80 invoices is enough to keep you busy for the day. Uploading five files should take about five minutes, but right now it takes them half an hour or even longer, and that means other people are waiting too... So yeah, I am going to "waste" minutes on this vital task that they perform every day and which doesn't work most of the time! In fact, the client called me twice to tell me this is a top priority and should be fixed ASAP.

                                      Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript

                                      C Offline
                                      C Offline
                                      charlieg
                                      wrote on last edited by
                                      #32

                                      ha! Always keep the factory running! Always.

                                      Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

                                      1 Reply Last reply
                                      0
                                      • Sander RosselS Sander Rossel

                                        Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                        Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                        C Offline
                                        C Offline
                                        charlieg
                                        wrote on last edited by
                                        #33

                                        So, you've isolated it to the DB load. In ghost jobs of long past, I know inserting records into tables with indexes could bite you, since the db has to do it's thing. The application was a "real-time" database that gated prepaid phone calls. Although we didn't do batch operations, we might have anywhere from 512-4096 call handler threads all hitting the database relatively at the same time (don't want to get pedantic) . However, most of these transactions were either reads or updates. Creation occurred only for new customers. So, lots of index use, but no new index entries. Please add a reminder to your task list to update the thread with what the actual problem is and how you solved it :) Oh, one other suggestion - it sounds like to me you could collect these files and set up an offline test system to play - remove the web complexity. I know you've done this a long time, so you probably already are doing this.

                                        Charlie Gilley “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759 Has never been more appropriate.

                                        1 Reply Last reply
                                        0
                                        • Sander RosselS Sander Rossel

                                          Not so much a programming question, but more of a what's your opinion / experience on the matter. So, a customer of mine is importing CSV files through a website, hosted in Microsoft Azure. The file typically has around 2000 to 2500 lines and 20 to 25 values per line (depending on the type of import). However, something is currently very wrong in my software or database and more often than not my customer is greeted by a timeout error (well, a "something went wrong" message, because I do proper error handling ;)). Last week, it took 17 tries to import one file. The funny thing is, it ultimately works every time. Of course it wasn't always like that, but the import table in Azure SQL has over 2 million records now and I need to validate for double records, calculate a couple of other fields for which I need additional data, create some grouped data, etc. Since I've added some stuff in the past year it's gotten slower and slower. Not so much an issue in my development environment, but apparently a bottleneck in production. Good thing they're calling me now that it's out of control, rather than sooner when it was more manageable :laugh: Anyway, I've currently got it down to four to six seconds, which is still an eternity I think. Inserting so many records into the database, as well as fetching a good amount, just takes some time, apparently. I'm doing everything synchronously (well, async, but waiting) and I haven't checked indexes yet, so maybe I could get it a bit faster still. Perhaps upgrading my database in Azure could help a bit to. If I really wanted to, I could make it instant, handle everything async, and give the user live updates. They've got like five imports a day, so it's not like these five seconds are that big of a deal. Other than that the system is pretty fast and they're very satisfied :D So, for my question, how long should such an action take, according to you? Is five seconds alright and should I just show a "busy" message, or is five seconds completely unacceptable (considering the scenario)?

                                          Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh

                                          M Offline
                                          M Offline
                                          MikeCO10
                                          wrote on last edited by
                                          #34

                                          Interesting. We have issues with CSV upload but it is in the upload failing itself, which is being used due to the lack of an API from an outside vendor. It's AWS to local. That's a topic in itself with a long, dumb winding road with a large vendor. Haven't had issues with Azure connects, though. My first question is process: It sounds like you have control over both ends of this? If so, why a CSV rather than a direct upsert? Five seconds feels like an eternity (it does really feel that way today!) but CSVs to DB with validation takes a while. Our environment is different, using Postgres on a nix with a triggered background cron and our line count is only in the hundreds per upload, so numbers don't compare. What's an import "table" and why does it exist, or is that just wording for a real destination table? This caught my eye:

                                          Quote:

                                          Not so much an issue in my development environment, but apparently a bottleneck in production.

                                          You mean the time just isn't an issue or does it run faster? Not sure why 5 seconds would create a "bottleneck" in production with five uploads a day? The timeout is generated on your end, right? Bump it up or calc it on estimated process time. If it ultimately works, let it run, no? Yeah, it's annoying looking at a spinner for a bit, but even if it's 30 seconds five times a day, no one is that busy. Sorry, haven't worked with SQL Server in years but we do find the more you can put into a longer SQL query, the better the performance vs outside processing. Again, with SQL Server caveat above, I'd also index the heck out of it with 2 mil records.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups