How to sort a big volume of data?
-
Hello, Hogan, again, well, the SQL solution is what I've been working on in the meantime while posting the question here, and it's probably going to be the best one. I was just curious whether I can do this in an easy way without SQL. Option 2: I was thinking about using the ArrayList, too, but somehow I was too obsessed with the DataTable that I ruled this option out :) Thanks for your help! ANYWAY, why is the DataTable throwing the OutOfMemoryException? Is it only designed to handle small data samples? I seriously doubt it. Thanks, Michal
The data would have to be Xml (or loaded into XML) to sort without a database for that much information. I was able to process 1,000,000 XML lines with System.Xml in > 9 secs on a bench mark I ran today for work. As another senior coder was wondering how my program using 100's of XML files would preform...
-Spacix All your skynet questions[^] belong to solved
-
The data would have to be Xml (or loaded into XML) to sort without a database for that much information. I was able to process 1,000,000 XML lines with System.Xml in > 9 secs on a bench mark I ran today for work. As another senior coder was wondering how my program using 100's of XML files would preform...
-Spacix All your skynet questions[^] belong to solved
That's impressive. I wouldn't even think about XML as the format itself is a kind of a synonymum to "SLOW" for me :) But still, I was not able to find any caveat concerning loading big chunks of data into DataTable from Microsoft. The OutOfMemoryException occured during normal operation, I've got 2 GBs or RAM on my box with Win XP SP2 and the used RAM was only something like 1.4 GBs at the time. So that was definitely not the lack of physical memory. So "there's something rotten in the state of DataTable" .. :) What is it? Michal
-
That's impressive. I wouldn't even think about XML as the format itself is a kind of a synonymum to "SLOW" for me :) But still, I was not able to find any caveat concerning loading big chunks of data into DataTable from Microsoft. The OutOfMemoryException occured during normal operation, I've got 2 GBs or RAM on my box with Win XP SP2 and the used RAM was only something like 1.4 GBs at the time. So that was definitely not the lack of physical memory. So "there's something rotten in the state of DataTable" .. :) What is it? Michal
Then my guess would be it is a permissions issue limiting the application...
-Spacix All your skynet questions[^] belong to solved
-
Then my guess would be it is a permissions issue limiting the application...
-Spacix All your skynet questions[^] belong to solved
It's strange as the DataTable is throwing an OutOfMemoryException if there are more than about 12,646,480 rows (I came to this number of rows by interval halving). However, the exception does not repeat itself reliably - sometimes the DataTable can sort 12,646,480 rows and sometimes it can't. With higher number of rows than 12,646,480, the certainty of the DataTable to throw an exception quickly rises and with lower number of rows, it quickly decreases. I REALLY wonder what this number of rows is related to. The number doesn't resemble any power of 2 and I tried logarithms of base 2 to 100 with no luck, too. Michal
-
Hello, Hogan, again, well, the SQL solution is what I've been working on in the meantime while posting the question here, and it's probably going to be the best one. I was just curious whether I can do this in an easy way without SQL. Option 2: I was thinking about using the ArrayList, too, but somehow I was too obsessed with the DataTable that I ruled this option out :) Thanks for your help! ANYWAY, why is the DataTable throwing the OutOfMemoryException? Is it only designed to handle small data samples? I seriously doubt it. Thanks, Michal
How fast is the SQL Server solution (import, sort, export?)
-
How fast is the SQL Server solution (import, sort, export?)
Obviously, the SQL-based solution is much slower as it stores the data to disk as opposed to working directly in memory. Importing the data is very slow (0.9 ms per row) compared to DataTable, sorting is lightning fast. However, I can accomplish the task with SQL, which can't be said about the DataTable-oriented solution. Michal
-
Obviously, the SQL-based solution is much slower as it stores the data to disk as opposed to working directly in memory. Importing the data is very slow (0.9 ms per row) compared to DataTable, sorting is lightning fast. However, I can accomplish the task with SQL, which can't be said about the DataTable-oriented solution. Michal
A few years ago, I wrote an sort routine for sorting BIG number of records, using the "insertation sort" algorythm (I´m a confused about the naming of the alg ..., maybe he was called "insertation sort" only in this one book ...). The main idea: for fixed length records, and an known lower and upper key (you know after the first read cycle), its possible to sort the file with only 2 read and 1 write cycle - if you need more, I´ll post something.
-
A few years ago, I wrote an sort routine for sorting BIG number of records, using the "insertation sort" algorythm (I´m a confused about the naming of the alg ..., maybe he was called "insertation sort" only in this one book ...). The main idea: for fixed length records, and an known lower and upper key (you know after the first read cycle), its possible to sort the file with only 2 read and 1 write cycle - if you need more, I´ll post something.
Please go ahead and post more. I have been working with a huge SQL database of Forex price ticks for almost a year now. By now, it consists of about 270 million rows. Every fresh idea on how to help with the pre-precessing of the data before importing it into the SQL database is warmly welcome! :) Thanks, Michal
-
Please go ahead and post more. I have been working with a huge SQL database of Forex price ticks for almost a year now. By now, it consists of about 270 million rows. Every fresh idea on how to help with the pre-precessing of the data before importing it into the SQL database is warmly welcome! :) Thanks, Michal
sorry for the delay, i was in heavy troubles, so i had no time ... please post a snipplet of the datafile, i´ll implemnt this insertation sort, and post.
-
sorry for the delay, i was in heavy troubles, so i had no time ... please post a snipplet of the datafile, i´ll implemnt this insertation sort, and post.
Hi, Thomas, I've resolved the issue in the meantime. Thanks for help, Michal
-
Hi, Thomas, I've resolved the issue in the meantime. Thanks for help, Michal
sorry again, how did you manage it? how is the performance? greetings, thomas