Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. How to get text from source code files ?

How to get text from source code files ?

Scheduled Pinned Locked Moved ASP.NET
tutorialquestionannouncement
7 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jenni2008
    wrote on last edited by
    #1

    Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni

    L C P 3 Replies Last reply
    0
    • J jenni2008

      Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni

      L Offline
      L Offline
      led mike
      wrote on last edited by
      #2

      jenni2008 wrote:

      I am a project manager for a large (very) project.

      Then I'm not sure you are allowed to use these forums.

      jenni2008 wrote:

      I am assigned the task of creating plain text version of web pages

      Don't you have a staff of developers that know how to do this? I mean if you don't, why does the company need a project manager?

      jenni2008 wrote:

      Is there an automated way to do this ?

      Yes. I highly recommend using computers and software as a means of automating the task.

      led mike

      1 Reply Last reply
      0
      • J jenni2008

        Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni

        C Offline
        C Offline
        Christian Graus
        wrote on last edited by
        #3

        You could do this with a regex and with C# code ( this is not an ASP.NET question ) that requests each page and then writes the text. One would hope there's an easy way to discover the 1000 pages, perhaps by adding code that finds and pursues links ?

        Christian Graus Driven to the arms of OSX by Vista.

        L 1 Reply Last reply
        0
        • C Christian Graus

          You could do this with a regex and with C# code ( this is not an ASP.NET question ) that requests each page and then writes the text. One would hope there's an easy way to discover the 1000 pages, perhaps by adding code that finds and pursues links ?

          Christian Graus Driven to the arms of OSX by Vista.

          L Offline
          L Offline
          led mike
          wrote on last edited by
          #4

          A tie, but you get the first position. Teachers pet  ;P

          led mike

          C 1 Reply Last reply
          0
          • L led mike

            A tie, but you get the first position. Teachers pet  ;P

            led mike

            C Offline
            C Offline
            Christian Graus
            wrote on last edited by
            #5

            *grin* I was just reading through the forums and thinking you seem especially bitter this morning. Most of the questions are ridiculous, but still, have you had a bad day, or are you just worn down by the flood of homework questions ?

            Christian Graus Driven to the arms of OSX by Vista.

            L 1 Reply Last reply
            0
            • C Christian Graus

              *grin* I was just reading through the forums and thinking you seem especially bitter this morning. Most of the questions are ridiculous, but still, have you had a bad day, or are you just worn down by the flood of homework questions ?

              Christian Graus Driven to the arms of OSX by Vista.

              L Offline
              L Offline
              led mike
              wrote on last edited by
              #6

              Christian Graus wrote:

              you seem especially bitter this morning

              Christian Graus wrote:

              but still, have you had a bad day, or

              I'm so misunderstood. I let my creative flair govern my replies, not my mood. :laugh::laugh: Ok, I admit it, I have no creative flair. :-O However I also have almost no emotion so i don't think it has anything to do with my mood. I can't really explain, maybe it has mostly to do with how I interpret, or read between the lines of, the loser posts. :) Interpreting text messages is fairly inaccurate due to the lossieness. No expressions, body language, tone.

              led mike

              1 Reply Last reply
              0
              • J jenni2008

                Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni

                P Offline
                P Offline
                ptrckmc249
                wrote on last edited by
                #7

                I use the following script. Put it in a file called RemoveTags.txt and execute from MS-DOS prompt C:/biterScripting/biterScripting.exe RemoveTags.txt dir("") files("*.html") If you really have 1000 documents, it may take a while. Hope this helps. (If you don't have biterScripting, goto biterScripting.com -> download) Patrick # START OF SCRIPT var str files # patterns for file names var str dir # dir where entire project is # Collect a list of files var str fileList find -rn $files $dir > $fileList # Process files one by one while ( $fileList <> "") do # Get the next file var str file lex "1" $fileList > $file # Read the file contents into a variable. var str content cat $file > $content # Remove all <> tags while ( { sen -r "^<&>^" $content } > 0 ) sal -r "^<&>^" "" $content > null # All <> are now removed in this one file. $content has the modified content. # sen = string enumerator, sal = string alterer, & = regular expression that matches any number of # any characters. <&> means, heck find out help pages. # If you want to remove empty lines, do in a loop like above, sal "^\n\n^" "\n" $content > null # Get the file name without the ending .html, etc. stex "[^.^l" $file > null # stex means string extractor. l means last instance. [ means, ... heck find out from the help pages. # Add .txt extension to file name. set $file = $file + ".txt" # Write the modified content to the .txt file. echo -e "DEBUG: Writing file " $file echo $content > { echo $file } done # end of do after while ( $fileList <> "") # All text version are now availabel in corresponding .txt files in the same directories for # the 1000 of your files.

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups