Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. What's a good FREE Windows-executable app that can scrape the text out of an HTML file

What's a good FREE Windows-executable app that can scrape the text out of an HTML file

Scheduled Pinned Locked Moved The Lounge
html
12 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    swampwiz
    wrote on last edited by
    #1

    I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

    R D P R J 8 Replies Last reply
    0
    • S swampwiz

      I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

      R Offline
      R Offline
      RickZeeland
      wrote on last edited by
      #2

      Maybe this one: ParseHub[^] or: Diggernaut - Turn website content into datasets[^]

      S 1 Reply Last reply
      0
      • S swampwiz

        I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

        D Offline
        D Offline
        David ONeil
        wrote on last edited by
        #3

        Having written VBA in Access to do it before, keep in mind that different sites use different tags. That is another small pain in the ass you have to handle.

        Our Forgotten Astronomy | Object Oriented Programming with C++ | Wordle solver

        1 Reply Last reply
        0
        • S swampwiz

          I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

          P Offline
          P Offline
          PIEBALDconsult
          wrote on last edited by
          #4

          What color should it be?

          1 Reply Last reply
          0
          • S swampwiz

            I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

            R Offline
            R Offline
            Ravi Bhavnani
            wrote on last edited by
            #5

            WebResourceProvider[^], perhaps? /ravi

            My new year resolution: 2048 x 1536 Home | Articles | My .NET bits | Freeware ravib(at)ravib(dot)com

            1 Reply Last reply
            0
            • S swampwiz

              I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

              J Offline
              J Offline
              Jorgen Andersson
              wrote on last edited by
              #6

              Sounds like you want the Lynx (web browser) - Wikipedia[^] It's a text based web browser. Should work fine as long as you don't need automation.

              Wrong is evil and must be defeated. - Jeff Ello

              1 Reply Last reply
              0
              • R RickZeeland

                Maybe this one: ParseHub[^] or: Diggernaut - Turn website content into datasets[^]

                S Offline
                S Offline
                swampwiz
                wrote on last edited by
                #7

                They both require signing up for an account, which I will not do.

                1 Reply Last reply
                0
                • S swampwiz

                  I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

                  S Offline
                  S Offline
                  swampwiz
                  wrote on last edited by
                  #8

                  UPDATE: I found something just as good - the Firefox add-on "Textise it".

                  1 Reply Last reply
                  0
                  • S swampwiz

                    I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

                    M Offline
                    M Offline
                    megaadam
                    wrote on last edited by
                    #9

                    Many browsers have a button: reader view click that

                    "If we don't change direction, we'll end up where we're going"

                    1 Reply Last reply
                    0
                    • S swampwiz

                      I basically want to scrape out the text part of a webpage, leaving all the frilly stuff - including login screens - out. And what I don't want is to have an app that just opens up the HTML as a text file like Notepad. I have also tried using OpenOffice Writer, and once it worked, putting the text at the bottom, but every subsequent time it just crashed.

                      E Offline
                      E Offline
                      englebart
                      wrote on last edited by
                      #10

                      Use send keys to automate Select All Ctrl+A Copy Ctrl+C Paste into Notepad Save

                      S 1 Reply Last reply
                      0
                      • E englebart

                        Use send keys to automate Select All Ctrl+A Copy Ctrl+C Paste into Notepad Save

                        S Offline
                        S Offline
                        swampwiz
                        wrote on last edited by
                        #11

                        This works pretty well too!

                        E 1 Reply Last reply
                        0
                        • S swampwiz

                          This works pretty well too!

                          E Offline
                          E Offline
                          englebart
                          wrote on last edited by
                          #12

                          You could probably skip notepad and write a simple console app or powershell script to extract Plain Text from clipboard into a file.

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups