Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Get dynamic web content

Get dynamic web content

Scheduled Pinned Locked Moved C / C++ / MFC
c++phphtmlhelpquestion
7 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Mathefreak
    wrote on last edited by
    #1

    I've written a little web crawler in VC++, which grabs financial indices and quotes from different websites and shows them. If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote_show.php[^]). IMHO there are 2 ways to get the information extracted: 1. grab the page as image, make OCR and extract the info 2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information #1 works in general, but ocr actually isn't accurate enough. #2: Are there any examples to show the handling of clipboard? On the other hand: Use of clipboard wouldn't be my first choice because the grab process is repeated automatically in background and with use of clipboard other applications running would be influenced. Are there any other ideas to solve the problem? TIA M.

    P D RaviBeeR 3 Replies Last reply
    0
    • M Mathefreak

      I've written a little web crawler in VC++, which grabs financial indices and quotes from different websites and shows them. If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote_show.php[^]). IMHO there are 2 ways to get the information extracted: 1. grab the page as image, make OCR and extract the info 2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information #1 works in general, but ocr actually isn't accurate enough. #2: Are there any examples to show the handling of clipboard? On the other hand: Use of clipboard wouldn't be my first choice because the grab process is repeated automatically in background and with use of clipboard other applications running would be influenced. Are there any other ideas to solve the problem? TIA M.

      P Offline
      P Offline
      Paresh Chitte
      wrote on last edited by
      #2

      Mathefreak wrote:

      2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information

      Try using IWebBrowser2, IHTMLDocument, IHTMLElement, and related interfaces. Regards, Paresh.

      1 Reply Last reply
      0
      • M Mathefreak

        I've written a little web crawler in VC++, which grabs financial indices and quotes from different websites and shows them. If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote_show.php[^]). IMHO there are 2 ways to get the information extracted: 1. grab the page as image, make OCR and extract the info 2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information #1 works in general, but ocr actually isn't accurate enough. #2: Are there any examples to show the handling of clipboard? On the other hand: Use of clipboard wouldn't be my first choice because the grab process is repeated automatically in background and with use of clipboard other applications running would be influenced. Are there any other ideas to solve the problem? TIA M.

        D Offline
        D Offline
        David Crow
        wrote on last edited by
        #3

        Mathefreak wrote:

        If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote\_show.php\[^\]).

        But the tables are still HTML. Unless I am not understanding, isn't row #3 of the upper-left table always "NASD Comp?" Or are you saying that the first column in each table continually changes?


        "A good athlete is the result of a good and worthy opponent." - David Crow

        "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

        1 Reply Last reply
        0
        • M Mathefreak

          I've written a little web crawler in VC++, which grabs financial indices and quotes from different websites and shows them. If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote_show.php[^]). IMHO there are 2 ways to get the information extracted: 1. grab the page as image, make OCR and extract the info 2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information #1 works in general, but ocr actually isn't accurate enough. #2: Are there any examples to show the handling of clipboard? On the other hand: Use of clipboard wouldn't be my first choice because the grab process is repeated automatically in background and with use of clipboard other applications running would be influenced. Are there any other ideas to solve the problem? TIA M.

          RaviBeeR Offline
          RaviBeeR Offline
          RaviBee
          wrote on last edited by
          #4

          By "dynamically", I assume you mean you can't rely on the order of information? If so, you could scrape tuples (eg: "NASD100=1888.08") instead of assuming the location of specific entries in the table. Btw, I wrote this[^] in order to build this[^]. /ravi

          This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

          M 1 Reply Last reply
          0
          • RaviBeeR RaviBee

            By "dynamically", I assume you mean you can't rely on the order of information? If so, you could scrape tuples (eg: "NASD100=1888.08") instead of assuming the location of specific entries in the table. Btw, I wrote this[^] in order to build this[^]. /ravi

            This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

            M Offline
            M Offline
            Mathefreak
            wrote on last edited by
            #5

            The only things which changes in the resulting webpage are the quote. My aim is to get the quote for DAX (7th row in upper left table). Are there any example to use the IWebBrowser2 interface to get the information. TIA M.

            RaviBeeR 1 Reply Last reply
            0
            • M Mathefreak

              The only things which changes in the resulting webpage are the quote. My aim is to get the quote for DAX (7th row in upper left table). Are there any example to use the IWebBrowser2 interface to get the information. TIA M.

              RaviBeeR Offline
              RaviBeeR Offline
              RaviBee
              wrote on last edited by
              #6

              Mathefreak wrote:

              My aim is to get the quote for DAX (7th row in upper left table).

              That's plain HTML and trivial to scrape. There's no need to use IWebBrowser2 to do that. /ravi

              This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

              M 1 Reply Last reply
              0
              • RaviBeeR RaviBee

                Mathefreak wrote:

                My aim is to get the quote for DAX (7th row in upper left table).

                That's plain HTML and trivial to scrape. There's no need to use IWebBrowser2 to do that. /ravi

                This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

                M Offline
                M Offline
                Mathefreak
                wrote on last edited by
                #7

                Hi Ravi, it's not only plain html, unfortunately. There are some java functions embedded to grab the actual quotes. Nevertheless, after searching around the net a bit, I'm proudly present the solution, which works for me :-D Sample application: - simple MFC-Dialog - one Webbrowser control (m_WebBrowserCtrl) - website is loaded and refreshed by button click - by clicking on a button the content of the site (plain text, not the html source) is copied into a CString variable to parse the data. void CWebbrowser_TestDlg::OnCopy() { IHTMLDocument2* m_pHTMLDocument2; LPDISPATCH lpDispatch; lpDispatch = m_WebBrowserCtrl.GetDocument(); HRESULT hr; if (lpDispatch) { hr = lpDispatch->QueryInterface(IID_IHTMLDocument2, (LPVOID*)&m_pHTMLDocument2); lpDispatch->Release(); ASSERT(SUCCEEDED(hr)); } CString sText; IHTMLElement *iSource; BSTR bstrSource; m_pHTMLDocument2->get_body(&iSource); iSource->get_outerText(&bstrSource); sText = bstrSource; MessageBox(sText); } Comments are welcome. Next step is to use the code in my application, but that seems to be easy. Greets M.

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups