Get dynamic web content

Mathefreak

I've written a little web crawler in VC++, which grabs financial indices and quotes from different websites and shows them. If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote_show.php[^]). IMHO there are 2 ways to get the information extracted: 1. grab the page as image, make OCR and extract the info 2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information #1 works in general, but ocr actually isn't accurate enough. #2: Are there any examples to show the handling of clipboard? On the other hand: Use of clipboard wouldn't be my first choice because the grab process is repeated automatically in background and with use of clipboard other applications running would be influenced. Are there any other ideas to solve the problem? TIA M.

Paresh Chitte

Mathefreak wrote:

2. load the page into a browser control to build the content, copy the content text (into clipboard) and extract information

Try using IWebBrowser2, IHTMLDocument, IHTMLElement, and related interfaces. Regards, Paresh.

David Crow

Mathefreak wrote:

If the sources are plain html, everything is fine. Now I've got a website which shows the quotes dynamically (http://www.forexpf.ru/quote\_show.php\[^\]).

But the tables are still HTML. Unless I am not understanding, isn't row #3 of the upper-left table always "NASD Comp?" Or are you saying that the first column in each table continually changes?

"A good athlete is the result of a good and worthy opponent." - David Crow

"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

RaviBee

By "dynamically", I assume you mean you can't rely on the order of information? If so, you could scrape tuples (eg: "NASD100=1888.08") instead of assuming the location of specific entries in the table. Btw, I wrote this[^] in order to build this[^]. /ravi

This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

Mathefreak

The only things which changes in the resulting webpage are the quote. My aim is to get the quote for DAX (7th row in upper left table). Are there any example to use the IWebBrowser2 interface to get the information. TIA M.

RaviBee

Mathefreak wrote:

My aim is to get the quote for DAX (7th row in upper left table).

That's plain HTML and trivial to scrape. There's no need to use IWebBrowser2 to do that. /ravi

This is your brain on Celcius Home | Music | Articles | Freeware | Trips ravib(at)ravib(dot)com

Mathefreak

Hi Ravi, it's not only plain html, unfortunately. There are some java functions embedded to grab the actual quotes. Nevertheless, after searching around the net a bit, I'm proudly present the solution, which works for me :-D Sample application: - simple MFC-Dialog - one Webbrowser control (m_WebBrowserCtrl) - website is loaded and refreshed by button click - by clicking on a button the content of the site (plain text, not the html source) is copied into a CString variable to parse the data. void CWebbrowser_TestDlg::OnCopy() { IHTMLDocument2* m_pHTMLDocument2; LPDISPATCH lpDispatch; lpDispatch = m_WebBrowserCtrl.GetDocument(); HRESULT hr; if (lpDispatch) { hr = lpDispatch->QueryInterface(IID_IHTMLDocument2, (LPVOID*)&m_pHTMLDocument2); lpDispatch->Release(); ASSERT(SUCCEEDED(hr)); } CString sText; IHTMLElement *iSource; BSTR bstrSource; m_pHTMLDocument2->get_body(&iSource); iSource->get_outerText(&bstrSource); sText = bstrSource; MessageBox(sText); } Comments are welcome. Next step is to use the code in my application, but that seems to be easy. Greets M.