Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Extract data out of a LARGE text file

Extract data out of a LARGE text file

Scheduled Pinned Locked Moved C#
question
8 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    amatbrewer
    wrote on last edited by
    #1

    Before I put a lot of effort into writing something from scratch, I bet someone out there already has most of what I need. I need to extract some data out of a log file. What I need to do is search a text file for a given string to find the first line of a block of data I want to process, extract some data out of that line, then read the next line and use a value in that to know how many more lines to read and process before looking for the next block and doing it again…and again…and again till the end of the file is reached. Pretty simple, but why reinvent the wheel? And I bet there are some really cool ways of doing this I would probably never think of. While you are at it any recommendations/warnings on doing this with VERY large text files (>2Gb)?

    David Wilkes

    L 1 Reply Last reply
    0
    • A amatbrewer

      Before I put a lot of effort into writing something from scratch, I bet someone out there already has most of what I need. I need to extract some data out of a log file. What I need to do is search a text file for a given string to find the first line of a block of data I want to process, extract some data out of that line, then read the next line and use a value in that to know how many more lines to read and process before looking for the next block and doing it again…and again…and again till the end of the file is reached. Pretty simple, but why reinvent the wheel? And I bet there are some really cool ways of doing this I would probably never think of. While you are at it any recommendations/warnings on doing this with VERY large text files (>2Gb)?

      David Wilkes

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Hi, I think it is not wise to have log files that large. Why not create a series of normal sized log files instead ? Just start a new file each day/each hour/whatever is appropriate. Simply include the date/time in the file name to keep them apart. You can keep them all in one folder, and if you need to transfer them as a single entity, a Zip utility would take care of that (as well as reducing overall size for you). BTW Notepad is probably not the optimal answer to your question. :)

      Luc Pattyn

      A 1 Reply Last reply
      0
      • L Luc Pattyn

        Hi, I think it is not wise to have log files that large. Why not create a series of normal sized log files instead ? Just start a new file each day/each hour/whatever is appropriate. Simply include the date/time in the file name to keep them apart. You can keep them all in one folder, and if you need to transfer them as a single entity, a Zip utility would take care of that (as well as reducing overall size for you). BTW Notepad is probably not the optimal answer to your question. :)

        Luc Pattyn

        A Offline
        A Offline
        amatbrewer
        wrote on last edited by
        #3

        Thanks for the advice but that presents some problems of its own because there are other tools that have to make use of these logs. Most of the time I will be processing 3 log files at a time of around 900Mb each. Needless to say it takes a while...:zzz: You should see how long it takes to FTP them...every Monday. I could break the logs up into smaller files, but I still end up having to process the entire volume as a whole anyway, as well as change the setup of the other tools. So I would like to avoid this if I can.

        David Wilkes

        L 1 Reply Last reply
        0
        • A amatbrewer

          Thanks for the advice but that presents some problems of its own because there are other tools that have to make use of these logs. Most of the time I will be processing 3 log files at a time of around 900Mb each. Needless to say it takes a while...:zzz: You should see how long it takes to FTP them...every Monday. I could break the logs up into smaller files, but I still end up having to process the entire volume as a whole anyway, as well as change the setup of the other tools. So I would like to avoid this if I can.

          David Wilkes

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #4

          Well, you could surely improve things: 1) before file transfer, try compression; again a ZIP utility is useful, even for a single file. On text files it would reduce size by a factor of about 3 to 5. 2) if you can modify the app that logs, you could leave everything as is, but add something that creates another file containing exactly (or approx) what you are really interested in. 3) I dont know what the underlying business logic is, but requiring that amount of text to be collected, transfered, and analyzed seems very strange to me. I would say the overall process deserves reconsidering. :)

          Luc Pattyn

          A 1 Reply Last reply
          0
          • L Luc Pattyn

            Well, you could surely improve things: 1) before file transfer, try compression; again a ZIP utility is useful, even for a single file. On text files it would reduce size by a factor of about 3 to 5. 2) if you can modify the app that logs, you could leave everything as is, but add something that creates another file containing exactly (or approx) what you are really interested in. 3) I dont know what the underlying business logic is, but requiring that amount of text to be collected, transfered, and analyzed seems very strange to me. I would say the overall process deserves reconsidering. :)

            Luc Pattyn

            A Offline
            A Offline
            amatbrewer
            wrote on last edited by
            #5

            This is a system that I sometimes think that Rube Goldberg designed. We are talking about a Cellular Phone system running on a UNIX platform (messing around on it is not an option). The available tools and interfaces are archaic at best, and my knowledge of UNIX…well lets say I know just enough to get into trouble. The volume of data while unfortunate is still only 1/3rd of what is actually collected, and I’ve reduce about as far as I can while still maintaining its integrity and validity. So, does it suck? Sure. But one of the reasons I like this job is that it is always a challenge…and this is today’s…tomorrow? Who knows, maybe I’ll have to solve Cold Fusion.

            David Wilkes

            S 2 Replies Last reply
            0
            • A amatbrewer

              This is a system that I sometimes think that Rube Goldberg designed. We are talking about a Cellular Phone system running on a UNIX platform (messing around on it is not an option). The available tools and interfaces are archaic at best, and my knowledge of UNIX…well lets say I know just enough to get into trouble. The volume of data while unfortunate is still only 1/3rd of what is actually collected, and I’ve reduce about as far as I can while still maintaining its integrity and validity. So, does it suck? Sure. But one of the reasons I like this job is that it is always a challenge…and this is today’s…tomorrow? Who knows, maybe I’ll have to solve Cold Fusion.

              David Wilkes

              S Offline
              S Offline
              spin vector
              wrote on last edited by
              #6

              I've run large log files for new processes, full throttle on Debug level. Also, unknown or old code may have large log files. Size is not the problem here. What I've done is read the articles on making an xml file or database (after xml) of a logfile. There are good articles on using .NET regex to parse a file and put it into xml. An overview article, a bit light on details, is http://msdn2.microsoft.com/en-us/library/ms972965.aspx This way you can rationalize the files, normalize them to a dB, then really look at the contents. Good luck.

              A 1 Reply Last reply
              0
              • A amatbrewer

                This is a system that I sometimes think that Rube Goldberg designed. We are talking about a Cellular Phone system running on a UNIX platform (messing around on it is not an option). The available tools and interfaces are archaic at best, and my knowledge of UNIX…well lets say I know just enough to get into trouble. The volume of data while unfortunate is still only 1/3rd of what is actually collected, and I’ve reduce about as far as I can while still maintaining its integrity and validity. So, does it suck? Sure. But one of the reasons I like this job is that it is always a challenge…and this is today’s…tomorrow? Who knows, maybe I’ll have to solve Cold Fusion.

                David Wilkes

                S Offline
                S Offline
                spin vector
                wrote on last edited by
                #7

                Also, rereading your original post, you need some grammatical structure here. Read the articles on Yacc, or for .NET anything that is a Yacc-like parser. The article on http://www.codeproject.com/csharp/minossecc.asp seems helpful. Search around, I can't put my finger on it but there are other non-Java .NET parsing meta-languages to help with the file structure. There's always MKS Lex and Yacc -- I've used a long time ago to great effect, not too hard to learn (days). But, find something free if possible. Cheers.

                1 Reply Last reply
                0
                • S spin vector

                  I've run large log files for new processes, full throttle on Debug level. Also, unknown or old code may have large log files. Size is not the problem here. What I've done is read the articles on making an xml file or database (after xml) of a logfile. There are good articles on using .NET regex to parse a file and put it into xml. An overview article, a bit light on details, is http://msdn2.microsoft.com/en-us/library/ms972965.aspx This way you can rationalize the files, normalize them to a dB, then really look at the contents. Good luck.

                  A Offline
                  A Offline
                  amatbrewer
                  wrote on last edited by
                  #8

                  Finally got a chance to look at the link you provided. This looks like it will work. I never expected I’d be able to simply locate the desired data based upon its pattern, but I was able to write a RegEx using Expresso that does it (at least on a small sample log). Now all I need to do is code it and see if it will work for the big files. Thanks!

                  David Wilkes

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups