Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Easy way to convert webpage to txt file?

Easy way to convert webpage to txt file?

Scheduled Pinned Locked Moved C / C++ / MFC
jsonhelptutorialquestion
6 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Selevercin
    wrote on last edited by
    #1

    I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

    T P R 3 Replies Last reply
    0
    • S Selevercin

      I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

      T Offline
      T Offline
      toxcct
      wrote on last edited by
      #2

      i'm not sure to understand... htm, html, dhtml files and so on are pure text file ! for example, you can do this simple following thing : save this page (this one or another is you prefer) as an html file... then, browse you hard disk toward the recently saved file. right click on the file and open it with Notepad... what do you see ? binary ? no of course. you can submit your parser an htm file directly. If you really need to have a txt file, you can simply change the extension (*.htm -> *.txt) or add the txt extension to the file name (*.htm -> *.htm.txt). whatever you want...


      TOXCCT >>> GEII power

      S 1 Reply Last reply
      0
      • S Selevercin

        I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

        P Offline
        P Offline
        palbano
        wrote on last edited by
        #3
        1. Open web page in Internet Explorer 2) From the menu select File/Save As... 3) In the "Save Web Page" dialog set "Save as type" to "TextFile (*.txt) 4) Give the file a name and a location and click the "Save" button

        "No matter where you go, there your are." - Buckaroo Banzai

        -pete

        1 Reply Last reply
        0
        • T toxcct

          i'm not sure to understand... htm, html, dhtml files and so on are pure text file ! for example, you can do this simple following thing : save this page (this one or another is you prefer) as an html file... then, browse you hard disk toward the recently saved file. right click on the file and open it with Notepad... what do you see ? binary ? no of course. you can submit your parser an htm file directly. If you really need to have a txt file, you can simply change the extension (*.htm -> *.txt) or add the txt extension to the file name (*.htm -> *.htm.txt). whatever you want...


          TOXCCT >>> GEII power

          S Offline
          S Offline
          Selevercin
          wrote on last edited by
          #4

          Not exactly what I mean. I wish to have a program go to a site, and then parse that. I want to only have to press a button to have it do all that... I wish to use fstream to parse the page's text, but I can't figure out how to access the text through the program. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

          G 1 Reply Last reply
          0
          • S Selevercin

            Not exactly what I mean. I wish to have a program go to a site, and then parse that. I want to only have to press a button to have it do all that... I wish to use fstream to parse the page's text, but I can't figure out how to access the text through the program. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

            G Offline
            G Offline
            georgiek50
            wrote on last edited by
            #5

            Have the program save a copy of the file (InternetReadFile) then use the fstream functions to read/parse it like it was any other file.

            1 Reply Last reply
            0
            • S Selevercin

              I'm looking at doing some VERY simple webpage parsing. My plan is to turn either the source or the output of the webpage into a text file, and then parse that. However, I do not know how to either get a pages source or output into a text file. Any hints or better (as in simplier) methods would be well appreciated. If you have a problem with my spelling, just remember that's not my fault. I (as well as everyone else who learned to spell after 1976) blame it on Robert A. Kolpek for U.S. Patent 4,136,395.

              R Offline
              R Offline
              Ravi Bhavnani
              wrote on last edited by
              #6

              This is exactly why I wrote the Web Resource Provider[^] framework. :)  Enjoy! /ravi My new year's resolution: 2048 x 1536 Home | Articles | Freeware | Music ravib@ravib.com

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups