Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How to get Millions of files from folder and bulk insert in database

How to get Millions of files from folder and bulk insert in database

Scheduled Pinned Locked Moved C#
csharpquestiondatabasepostgresqlperformance
9 Posts 5 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Ankur B Patel
    wrote on last edited by
    #1

    I have some folders having files, its count like 40Lac, 70Lac, 1Cr, 1.5cr. And filetype like .png, .xlsx, .txt, .msg, .ico, .jpg, .bmp etc...

    Now i want to insert this filename into my database table with its size, but when i implement it take too much time to scanning the folder, and then after it throws memory exception.

    can anyone please help me out here, how can i implement this scenario in a better manner.
    how can insert faster in table ?

    I am using C#.net with PostgreSQL database.

    Ankur B. Patel

    L OriginalGriffO M J 4 Replies Last reply
    0
    • A Ankur B Patel

      I have some folders having files, its count like 40Lac, 70Lac, 1Cr, 1.5cr. And filetype like .png, .xlsx, .txt, .msg, .ico, .jpg, .bmp etc...

      Now i want to insert this filename into my database table with its size, but when i implement it take too much time to scanning the folder, and then after it throws memory exception.

      can anyone please help me out here, how can i implement this scenario in a better manner.
      how can insert faster in table ?

      I am using C#.net with PostgreSQL database.

      Ankur B. Patel

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      First of all, what is a lac and cr? This is never going to be "fast", more important is that you do it correct. I assume your putting all your shit in a blob. Don't, txt should be archived as memo, so you can later use search form DB. If you just want to archive names and their sizes, read the entire folders' contents and spawn some threads to save chuncks of that.

      Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

      L 1 Reply Last reply
      0
      • L Lost User

        First of all, what is a lac and cr? This is never going to be "fast", more important is that you do it correct. I assume your putting all your shit in a blob. Don't, txt should be archived as memo, so you can later use search form DB. If you just want to archive names and their sizes, read the entire folders' contents and spawn some threads to save chuncks of that.

        Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #3

        Eddy Vluggen wrote:

        First of all, what is a lac and cr?

        Indian numbering system - Wikipedia[^]

        L 1 Reply Last reply
        0
        • L Lost User

          Eddy Vluggen wrote:

          First of all, what is a lac and cr?

          Indian numbering system - Wikipedia[^]

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          Yah. So them like Americans with their pounds. In international communication we use the SI system. If you can't, better learn :)

          Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

          A 1 Reply Last reply
          0
          • L Lost User

            Yah. So them like Americans with their pounds. In international communication we use the SI system. If you can't, better learn :)

            Bastard Programmer from Hell :suss: "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

            A Offline
            A Offline
            Ankur B Patel
            wrote on last edited by
            #5

            Thanks but, I just correct here that forget about lac and cr (it's Lakh and Crore), keep in mind that you have millions of files in a folder and I just want to insert the files name and its size in bytes into the database.

            how do I bulk insert? and also keep in mind that system memory doesn't go high.

            please suggest to me, how I can do it quickly and securely.

            L 1 Reply Last reply
            0
            • A Ankur B Patel

              I have some folders having files, its count like 40Lac, 70Lac, 1Cr, 1.5cr. And filetype like .png, .xlsx, .txt, .msg, .ico, .jpg, .bmp etc...

              Now i want to insert this filename into my database table with its size, but when i implement it take too much time to scanning the folder, and then after it throws memory exception.

              can anyone please help me out here, how can i implement this scenario in a better manner.
              how can insert faster in table ?

              I am using C#.net with PostgreSQL database.

              Ankur B. Patel

              OriginalGriffO Offline
              OriginalGriffO Offline
              OriginalGriff
              wrote on last edited by
              #6

              If this is your idea of a backup mechanism, don't bother: it'll never be quick, it'll never be efficient, and it will always risk running out of memory. Instead, think about using a "proper" backup system which archives the disk as sectors instead of files - it's a lot quicker, a lot more efficient, and much, much safer. Remember, backups should be air-gapped for safety: DB's and suchlike are just files and as such are just as much at risk from ransomware and any other data (more so, in some cases as they are a "Prime target" for ransomware to exploit). I use AOMEI Backupper - it has a free Standard version and it allows you to mount the backup images as a disk and retrieve individual files if necessary.

              "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

              "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
              "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

              1 Reply Last reply
              0
              • A Ankur B Patel

                Thanks but, I just correct here that forget about lac and cr (it's Lakh and Crore), keep in mind that you have millions of files in a folder and I just want to insert the files name and its size in bytes into the database.

                how do I bulk insert? and also keep in mind that system memory doesn't go high.

                please suggest to me, how I can do it quickly and securely.

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                Pipe a DIR listing to a text file. That'll give you the list you need; which you can then "substring" and load into a table of files names, etc. Displays a list of files and subdirectories in a directory. DIR [drive:][path][filename] [/A[[:]attributes]] [/B] [/C] [/D] [/L] [/N] [/O[[:]sortorder]] [/P] [/Q] [/R] [/S] [/T[[:]timefield]] [/W] [/X] [/4] [drive:][path][filename] Specifies drive, directory, and/or files to list. /A Displays files with specified attributes. attributes D Directories R Read-only files H Hidden files A Files ready for archiving S System files I Not content indexed files L Reparse Points O Offline files - Prefix meaning not /B Uses bare format (no heading information or summary). /C Display the thousand separator in file sizes. This is the default. Use /-C to disable display of separator. /D Same as wide but files are list sorted by column. /L Uses lowercase. /N New long list format where filenames are on the far right. /O List by files in sorted order. sortorder N By name (alphabetic) S By size (smallest first) E By extension (alphabetic) D By date/time (oldest first) G Group directories first - Prefix to reverse order /P Pauses after each screenful of information. /Q Display the owner of the file. /R Display alternate data streams of the file. /S Displays files in specified directory and all subdirectories.

                It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                1 Reply Last reply
                0
                • A Ankur B Patel

                  I have some folders having files, its count like 40Lac, 70Lac, 1Cr, 1.5cr. And filetype like .png, .xlsx, .txt, .msg, .ico, .jpg, .bmp etc...

                  Now i want to insert this filename into my database table with its size, but when i implement it take too much time to scanning the folder, and then after it throws memory exception.

                  can anyone please help me out here, how can i implement this scenario in a better manner.
                  how can insert faster in table ?

                  I am using C#.net with PostgreSQL database.

                  Ankur B. Patel

                  M Offline
                  M Offline
                  Michael Sydney Balloni
                  wrote on last edited by
                  #8

                  Hi Ankur, I'm going to assume your files are in a bunch of directories, not all in one, or mostly in a few, but leafy. I'd use the System.IO classes Directory and Path and FileInfo. You can use Directory.GetDirectories to get then directories, then loop over those directory paths calling Directory.GetFiles to get file paths to the files, then for each file path call Path.GetFilename to get the filename, and (new FileInfo(filePath)).Length to get the file length. I'd use a DB transaction per directory, and create an INSERT statement per file and execute it inside the transaction, and commit when you're done with the files in that directory. Things not to do: Don't get all files in all directories. Don't build one SQL statement for all the files. Hope this helps, -Michael

                  1 Reply Last reply
                  0
                  • A Ankur B Patel

                    I have some folders having files, its count like 40Lac, 70Lac, 1Cr, 1.5cr. And filetype like .png, .xlsx, .txt, .msg, .ico, .jpg, .bmp etc...

                    Now i want to insert this filename into my database table with its size, but when i implement it take too much time to scanning the folder, and then after it throws memory exception.

                    can anyone please help me out here, how can i implement this scenario in a better manner.
                    how can insert faster in table ?

                    I am using C#.net with PostgreSQL database.

                    Ankur B. Patel

                    J Offline
                    J Offline
                    jschell
                    wrote on last edited by
                    #9

                    Ankur B. Patel wrote:

                    can anyone please help me out here, how can i implement this scenario in a better manner.

                    Then easiest and best answer do not do that in the first place. File systems, not databases, have existed for a long time and exist solely to manage files. What you have are files. If you want to reference the files in the database then do just that. Keep the files on the file system and provide a reference (absolute or relative) to the location of the file. You can keep meta data on the file in the database, such as name, size, type, etc. Might also want to insure uniqueness also. You do that by finding out what makes one file always different from other. Usually some combination of name and location. And insure that is maintained. Finally of course how will these files be used. For example if you expect a png to show up in a web page that has 10,000 unique visits a day, you certainly do not want to be pulling it out of a database.

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups