Easy way to convert webpage to txt file?
-
I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
-
I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
i'm not sure to understand... htm, html, dhtml files and so on are pure text file ! for example, you can do this simple following thing : save this page (this one or another is you prefer) as an html file... then, browse you hard disk toward the recently saved file. right click on the file and open it with Notepad... what do you see ? binary ? no of course. you can submit your parser an htm file directly. If you really need to have a txt file, you can simply change the extension (*.htm -> *.txt) or add the txt extension to the file name (*.htm -> *.htm.txt). whatever you want...
TOXCCT >>> GEII power
-
I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
- Open web page in Internet Explorer 2) From the menu select File/Save As... 3) In the "Save Web Page" dialog set "Save as type" to "TextFile (*.txt) 4) Give the file a name and a location and click the "Save" button
"No matter where you go, there your are." - Buckaroo Banzai
-pete
-
i'm not sure to understand... htm, html, dhtml files and so on are pure text file ! for example, you can do this simple following thing : save this page (this one or another is you prefer) as an html file... then, browse you hard disk toward the recently saved file. right click on the file and open it with Notepad... what do you see ? binary ? no of course. you can submit your parser an htm file directly. If you really need to have a txt file, you can simply change the extension (*.htm -> *.txt) or add the txt extension to the file name (*.htm -> *.htm.txt). whatever you want...
TOXCCT >>> GEII power
Not exactly what I mean. I wish to have a program go to a site, and then parse that. I want to only have to press a button to have it do all that... I wish to use fstream to parse the page's text, but I can't figure out how to access the text through the program. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
-
Not exactly what I mean. I wish to have a program go to a site, and then parse that. I want to only have to press a button to have it do all that... I wish to use fstream to parse the page's text, but I can't figure out how to access the text through the program. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
Have the program save a copy of the file (InternetReadFile) then use the fstream functions to read/parse it like it was any other file.
-
I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.
This is exactly why I wrote the Web Resource Provider[^] framework. :) Enjoy! /ravi My new year's resolution: 2048 x 1536 Home | Articles | Freeware | Music ravib@ravib.com