How to get text from source code files ?
-
Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni
-
Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni
jenni2008 wrote:
I am a project manager for a large (very) project.
Then I'm not sure you are allowed to use these forums.
jenni2008 wrote:
I am assigned the task of creating plain text version of web pages
Don't you have a staff of developers that know how to do this? I mean if you don't, why does the company need a project manager?
jenni2008 wrote:
Is there an automated way to do this ?
Yes. I highly recommend using computers and software as a means of automating the task.
led mike
-
Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni
You could do this with a regex and with C# code ( this is not an ASP.NET question ) that requests each page and then writes the text. One would hope there's an easy way to discover the 1000 pages, perhaps by adding code that finds and pursues links ?
Christian Graus Driven to the arms of OSX by Vista.
-
You could do this with a regex and with C# code ( this is not an ASP.NET question ) that requests each page and then writes the text. One would hope there's an easy way to discover the 1000 pages, perhaps by adding code that finds and pursues links ?
Christian Graus Driven to the arms of OSX by Vista.
-
*grin* I was just reading through the forums and thinking you seem especially bitter this morning. Most of the questions are ridiculous, but still, have you had a bad day, or are you just worn down by the flood of homework questions ?
Christian Graus Driven to the arms of OSX by Vista.
-
*grin* I was just reading through the forums and thinking you seem especially bitter this morning. Most of the questions are ridiculous, but still, have you had a bad day, or are you just worn down by the flood of homework questions ?
Christian Graus Driven to the arms of OSX by Vista.
Christian Graus wrote:
you seem especially bitter this morning
Christian Graus wrote:
but still, have you had a bad day, or
I'm so misunderstood. I let my creative flair govern my replies, not my mood. :laugh::laugh: Ok, I admit it, I have no creative flair. :-O However I also have almost no emotion so i don't think it has anything to do with my mood. I can't really explain, maybe it has mostly to do with how I interpret, or read between the lines of, the loser posts. :) Interpreting text messages is fairly inaccurate due to the lossieness. No expressions, body language, tone.
led mike
-
Hi all: I am a project manager for a large (very) project. I am assigned the task of creating plain text version of web pages by removing <> and send to legal dept for review. Is there an automated way to do this ? The number of documents is very large - in excess of 1000. Thank you. Jenni
I use the following script. Put it in a file called RemoveTags.txt and execute from MS-DOS prompt C:/biterScripting/biterScripting.exe RemoveTags.txt dir("") files("*.html") If you really have 1000 documents, it may take a while. Hope this helps. (If you don't have biterScripting, goto biterScripting.com -> download) Patrick # START OF SCRIPT var str files # patterns for file names var str dir # dir where entire project is # Collect a list of files var str fileList find -rn $files $dir > $fileList # Process files one by one while ( $fileList <> "") do # Get the next file var str file lex "1" $fileList > $file # Read the file contents into a variable. var str content cat $file > $content # Remove all <> tags while ( { sen -r "^<&>^" $content } > 0 ) sal -r "^<&>^" "" $content > null # All <> are now removed in this one file. $content has the modified content. # sen = string enumerator, sal = string alterer, & = regular expression that matches any number of # any characters. <&> means, heck find out help pages. # If you want to remove empty lines, do in a loop like above, sal "^\n\n^" "\n" $content > null # Get the file name without the ending .html, etc. stex "[^.^l" $file > null # stex means string extractor. l means last instance. [ means, ... heck find out from the help pages. # Add .txt extension to file name. set $file = $file + ".txt" # Write the modified content to the .txt file. echo -e "DEBUG: Writing file " $file echo $content > { echo $file } done # end of do after while ( $fileList <> "") # All text version are now availabel in corresponding .txt files in the same directories for # the 1000 of your files.