Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Loading Very Large DataSet Without losing any information

Loading Very Large DataSet Without losing any information

Scheduled Pinned Locked Moved C#
csharp
11 Posts 9 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    losan
    wrote on last edited by
    #1

    Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

    losan1985

    D S P M V 7 Replies Last reply
    0
    • L losan

      Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

      losan1985

      D Offline
      D Offline
      Dave Kreskowiak
      wrote on last edited by
      #2

      It's probably going to take rolling your own custom class to hold it all. I don't know of anything "off-the-shelf" that will hold 10,000 columns. Frankly, I've never even HEARD of such a wide CSV file ever being used. It shouldn't be very hard at all to create a List> or whatever your item data type is. Basically, a List of List of Integers.

      A guide to posting questions on CodeProject[^]
      Dave Kreskowiak

      1 Reply Last reply
      0
      • L losan

        Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

        losan1985

        S Offline
        S Offline
        SledgeHammer01
        wrote on last edited by
        #3

        You probably can't load it all at once (at least in a 32-bit OS). 40,000 x 10,000 = 400,000,000 bytes if each cell is 1 byte. If you assume an average of 16 bytes (since you didn't say) per cell, thats 6,400,000,000 bytes = 5GB of data. You only have 2GB of address space for your application. You can do it on a 64-bit OS though. With that being said, I doubt you really need 40,000 x 10,000 cells loaded in memory at once. What is a person going to do with all that data? You might want to consider loading only the portion you need.

        1 Reply Last reply
        0
        • L losan

          Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

          losan1985

          P Offline
          P Offline
          Pete OHanlon
          wrote on last edited by
          #4

          What is your actual requirement? You are loading this data for a reason. What is that reason? For instance, are you performing some calculation on certain columns? By breaking down your requirements, we can work out a practical solution.

          I was brought up to respect my elders. I don't respect many people nowadays.
          CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

          R 1 Reply Last reply
          0
          • L losan

            Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

            losan1985

            M Offline
            M Offline
            Mycroft Holmes
            wrote on last edited by
            #5

            As POH has said your design has to be wrong for this to be a valid requirement. Go back and look at how the CSV was created, why does it require 10k columns (what a ridiculous number). Can your source break it up into more swallowable chunks. Do you need all 10k columns. Can you load and process 1 row at a time, presumably you want to dump this into some more reasonable format.

            Never underestimate the power of human stupidity RAH

            1 Reply Last reply
            0
            • P Pete OHanlon

              What is your actual requirement? You are loading this data for a reason. What is that reason? For instance, are you performing some calculation on certain columns? By breaking down your requirements, we can work out a practical solution.

              I was brought up to respect my elders. I don't respect many people nowadays.
              CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

              R Offline
              R Offline
              Rockstar_
              wrote on last edited by
              #6

              yes, this may solve your problem....

              1 Reply Last reply
              0
              • L losan

                Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

                losan1985

                V Offline
                V Offline
                V 0
                wrote on last edited by
                #7

                You'll need to build in a sort of paging mechanism that only loads that part that is shown on the screen.

                V.
                (MQOTD Rules and previous Solutions )

                1 Reply Last reply
                0
                • L losan

                  Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

                  losan1985

                  B Offline
                  B Offline
                  BobJanova
                  wrote on last edited by
                  #8

                  As Sledgehammer01 says, that's an unreasonably large amount of data for most purposes. It's 400 million cells and so you're talking about GB of memory, depending on exactly what's in there. What do you want to do with this dataset? You almost certainly want a load-on-demand adapter of some kind, so you can run through the data without actually having it all in memory at once. This library is rather good; I used it in a real application (though not dealing with massive datasets) without problem.

                  1 Reply Last reply
                  0
                  • L losan

                    Hi All; I have a large dataset stored in a CSV file (about 40,000 Rows and 10,000 Columns). I need to load it into a C# Windows application. So, any idea to do this. I tried different code, but some are able to loao 40,000 R * 255 C, and other codes are able to load 5,000 R and 10,000 C. Thanks

                    losan1985

                    A Offline
                    A Offline
                    Alan Balkany
                    wrote on last edited by
                    #9

                    Hi losan, I found your post very interesting because I've never encountered a data set that large. Are you trying to analyze that data? If so, I may be able to help. I have a product (www.patternscope.com) that finds patterns in extremely large data sets. I think your data set would be good for stress-testing the application, and it fits perfectly with two planned developments: 1. Reading CSV data (currently it only reads databases through ODBC, or flat files), and 2. Making a C#-callable API that you could use in your C# application to handle that much data (e.g. queries, retrieval, and analysis). My product extracts the patterns that comprise the raw data. These patterns are a fraction of the size of the original raw data, so they fit entirely into memory, even when the raw data is larger than the memory available. The patterns have the same information content as the raw data, so can be processed (e.g. queries or analysis) many times faster. If you could give me a copy of your data set, I could give you a free copy of PatternScope (after I adapt it for reading CSV data) which you could use to analyze the data, followed by a DLL you could call from C# for processing the data in your program. What does this data represent?

                    L 1 Reply Last reply
                    0
                    • A Alan Balkany

                      Hi losan, I found your post very interesting because I've never encountered a data set that large. Are you trying to analyze that data? If so, I may be able to help. I have a product (www.patternscope.com) that finds patterns in extremely large data sets. I think your data set would be good for stress-testing the application, and it fits perfectly with two planned developments: 1. Reading CSV data (currently it only reads databases through ODBC, or flat files), and 2. Making a C#-callable API that you could use in your C# application to handle that much data (e.g. queries, retrieval, and analysis). My product extracts the patterns that comprise the raw data. These patterns are a fraction of the size of the original raw data, so they fit entirely into memory, even when the raw data is larger than the memory available. The patterns have the same information content as the raw data, so can be processed (e.g. queries or analysis) many times faster. If you could give me a copy of your data set, I could give you a free copy of PatternScope (after I adapt it for reading CSV data) which you could use to analyze the data, followed by a DLL you could call from C# for processing the data in your program. What does this data represent?

                      L Offline
                      L Offline
                      losan
                      wrote on last edited by
                      #10

                      Hi; Here is a link for the Dataset "www.dropbox.com/s/een9zlqce4vqqrl/ProjectData3.csv" What I need to do is to apply the collaborative filtering algorithms in the dataset. The data set is about Tweets, who is going to retweet from another person. Thanks

                      losan1985

                      A 1 Reply Last reply
                      0
                      • L losan

                        Hi; Here is a link for the Dataset "www.dropbox.com/s/een9zlqce4vqqrl/ProjectData3.csv" What I need to do is to apply the collaborative filtering algorithms in the dataset. The data set is about Tweets, who is going to retweet from another person. Thanks

                        losan1985

                        A Offline
                        A Offline
                        Alan Balkany
                        wrote on last edited by
                        #11

                        Thanks. When I've adapted PatternScope for comma-separated values, I'll send you a copy. Collaborative filtering looks interesting.

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups