Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How to sort a big volume of data?

How to sort a big volume of data?

Scheduled Pinned Locked Moved C#
questioncsharptutorial
16 Posts 4 Posters 16 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Spacix One

    The data would have to be Xml (or loaded into XML) to sort without a database for that much information. I was able to process 1,000,000 XML lines with System.Xml in > 9 secs on a bench mark I ran today for work. As another senior coder was wondering how my program using 100's of XML files would preform...


    -Spacix All your skynet questions[^] belong to solved

    M Offline
    M Offline
    michal kreslik
    wrote on last edited by
    #7

    That's impressive. I wouldn't even think about XML as the format itself is a kind of a synonymum to "SLOW" for me :) But still, I was not able to find any caveat concerning loading big chunks of data into DataTable from Microsoft. The OutOfMemoryException occured during normal operation, I've got 2 GBs or RAM on my box with Win XP SP2 and the used RAM was only something like 1.4 GBs at the time. So that was definitely not the lack of physical memory. So "there's something rotten in the state of DataTable" .. :) What is it? Michal

    S 1 Reply Last reply
    0
    • M michal kreslik

      That's impressive. I wouldn't even think about XML as the format itself is a kind of a synonymum to "SLOW" for me :) But still, I was not able to find any caveat concerning loading big chunks of data into DataTable from Microsoft. The OutOfMemoryException occured during normal operation, I've got 2 GBs or RAM on my box with Win XP SP2 and the used RAM was only something like 1.4 GBs at the time. So that was definitely not the lack of physical memory. So "there's something rotten in the state of DataTable" .. :) What is it? Michal

      S Offline
      S Offline
      Spacix One
      wrote on last edited by
      #8

      Then my guess would be it is a permissions issue limiting the application...


      -Spacix All your skynet questions[^] belong to solved

      M 1 Reply Last reply
      0
      • S Spacix One

        Then my guess would be it is a permissions issue limiting the application...


        -Spacix All your skynet questions[^] belong to solved

        M Offline
        M Offline
        michal kreslik
        wrote on last edited by
        #9

        It's strange as the DataTable is throwing an OutOfMemoryException if there are more than about 12,646,480 rows (I came to this number of rows by interval halving). However, the exception does not repeat itself reliably - sometimes the DataTable can sort 12,646,480 rows and sometimes it can't. With higher number of rows than 12,646,480, the certainty of the DataTable to throw an exception quickly rises and with lower number of rows, it quickly decreases. I REALLY wonder what this number of rows is related to. The number doesn't resemble any power of 2 and I tried logarithms of base 2 to 100 with no luck, too. Michal

        1 Reply Last reply
        0
        • M michal kreslik

          Hello, Hogan, again, well, the SQL solution is what I've been working on in the meantime while posting the question here, and it's probably going to be the best one. I was just curious whether I can do this in an easy way without SQL. Option 2: I was thinking about using the ArrayList, too, but somehow I was too obsessed with the DataTable that I ruled this option out :) Thanks for your help! ANYWAY, why is the DataTable throwing the OutOfMemoryException? Is it only designed to handle small data samples? I seriously doubt it. Thanks, Michal

          T Offline
          T Offline
          Thomas Krojer
          wrote on last edited by
          #10

          How fast is the SQL Server solution (import, sort, export?)

          M 1 Reply Last reply
          0
          • T Thomas Krojer

            How fast is the SQL Server solution (import, sort, export?)

            M Offline
            M Offline
            michal kreslik
            wrote on last edited by
            #11

            Obviously, the SQL-based solution is much slower as it stores the data to disk as opposed to working directly in memory. Importing the data is very slow (0.9 ms per row) compared to DataTable, sorting is lightning fast. However, I can accomplish the task with SQL, which can't be said about the DataTable-oriented solution. Michal

            T 1 Reply Last reply
            0
            • M michal kreslik

              Obviously, the SQL-based solution is much slower as it stores the data to disk as opposed to working directly in memory. Importing the data is very slow (0.9 ms per row) compared to DataTable, sorting is lightning fast. However, I can accomplish the task with SQL, which can't be said about the DataTable-oriented solution. Michal

              T Offline
              T Offline
              Thomas Krojer
              wrote on last edited by
              #12

              A few years ago, I wrote an sort routine for sorting BIG number of records, using the "insertation sort" algorythm (I´m a confused about the naming of the alg ..., maybe he was called "insertation sort" only in this one book ...). The main idea: for fixed length records, and an known lower and upper key (you know after the first read cycle), its possible to sort the file with only 2 read and 1 write cycle - if you need more, I´ll post something.

              M 1 Reply Last reply
              0
              • T Thomas Krojer

                A few years ago, I wrote an sort routine for sorting BIG number of records, using the "insertation sort" algorythm (I´m a confused about the naming of the alg ..., maybe he was called "insertation sort" only in this one book ...). The main idea: for fixed length records, and an known lower and upper key (you know after the first read cycle), its possible to sort the file with only 2 read and 1 write cycle - if you need more, I´ll post something.

                M Offline
                M Offline
                michal kreslik
                wrote on last edited by
                #13

                Please go ahead and post more. I have been working with a huge SQL database of Forex price ticks for almost a year now. By now, it consists of about 270 million rows. Every fresh idea on how to help with the pre-precessing of the data before importing it into the SQL database is warmly welcome! :) Thanks, Michal

                T 1 Reply Last reply
                0
                • M michal kreslik

                  Please go ahead and post more. I have been working with a huge SQL database of Forex price ticks for almost a year now. By now, it consists of about 270 million rows. Every fresh idea on how to help with the pre-precessing of the data before importing it into the SQL database is warmly welcome! :) Thanks, Michal

                  T Offline
                  T Offline
                  Thomas Krojer
                  wrote on last edited by
                  #14

                  sorry for the delay, i was in heavy troubles, so i had no time ... please post a snipplet of the datafile, i´ll implemnt this insertation sort, and post.

                  M 1 Reply Last reply
                  0
                  • T Thomas Krojer

                    sorry for the delay, i was in heavy troubles, so i had no time ... please post a snipplet of the datafile, i´ll implemnt this insertation sort, and post.

                    M Offline
                    M Offline
                    michal kreslik
                    wrote on last edited by
                    #15

                    Hi, Thomas, I've resolved the issue in the meantime. Thanks for help, Michal

                    T 1 Reply Last reply
                    0
                    • M michal kreslik

                      Hi, Thomas, I've resolved the issue in the meantime. Thanks for help, Michal

                      T Offline
                      T Offline
                      Thomas Krojer
                      wrote on last edited by
                      #16

                      sorry again, how did you manage it? how is the performance? greetings, thomas

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups