get_anchors doesn't return all anchors
-
Hello, I am trying to get all links on an web page. Unfortunately, the "get_anchors" doesn't seem to retrieve all anchor elements. I have pages full of links and all I get are few or none. I'm using Visual C++ 2003. I have an MFC Dialog application with embedded webbrowser control. Here's my code: (btw, GetDocument function calls IWebBrowser::get_Document() function and it works fine with things other than getting anchors).
HRESULT hr; IHTMLDocument2 \* pHtmlDoc = GetDocument(); CStringArray sURLArray; if (pHtmlDoc != NULL) { IHTMLElementCollection \* pColl = NULL; hr = pHtmlDoc->get\_anchors(&pColl); if(SUCCEEDED(hr)) { LONG nElem = 0; hr = pColl->get\_length(&nElem); if(SUCCEEDED(hr)) { for(long i = 0; i < nElem; i++) { \_variant\_t vIndex(i); IDispatch \* pDisp2 = NULL; hr = pColl->item(vIndex, vIndex, &pDisp2); if(SUCCEEDED(hr)) { IHTMLAnchorElement \* pAnchElem = NULL; hr = pDisp2->QueryInterface(IID\_IHTMLAnchorElement, (void\*\*) &pAnchElem); if(SUCCEEDED(hr)) { BSTR bstrHref; if(SUCCEEDED(pAnchElem->get\_href(&bstrHref))) { CString strLink(bstrHref); if(!strLink.IsEmpty()) sURLArray.Add(strLink); SysFreeString(bstrHref); } pAnchElem->Release(); } pDisp2->Release(); } MessageBox(sURLArray.GetAt(i)); } } pColl->Release(); } pHtmlDoc->Release(); }
Taking a look now at the code. I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets. Any help is appreciated Thanks
-
Hello, I am trying to get all links on an web page. Unfortunately, the "get_anchors" doesn't seem to retrieve all anchor elements. I have pages full of links and all I get are few or none. I'm using Visual C++ 2003. I have an MFC Dialog application with embedded webbrowser control. Here's my code: (btw, GetDocument function calls IWebBrowser::get_Document() function and it works fine with things other than getting anchors).
HRESULT hr; IHTMLDocument2 \* pHtmlDoc = GetDocument(); CStringArray sURLArray; if (pHtmlDoc != NULL) { IHTMLElementCollection \* pColl = NULL; hr = pHtmlDoc->get\_anchors(&pColl); if(SUCCEEDED(hr)) { LONG nElem = 0; hr = pColl->get\_length(&nElem); if(SUCCEEDED(hr)) { for(long i = 0; i < nElem; i++) { \_variant\_t vIndex(i); IDispatch \* pDisp2 = NULL; hr = pColl->item(vIndex, vIndex, &pDisp2); if(SUCCEEDED(hr)) { IHTMLAnchorElement \* pAnchElem = NULL; hr = pDisp2->QueryInterface(IID\_IHTMLAnchorElement, (void\*\*) &pAnchElem); if(SUCCEEDED(hr)) { BSTR bstrHref; if(SUCCEEDED(pAnchElem->get\_href(&bstrHref))) { CString strLink(bstrHref); if(!strLink.IsEmpty()) sURLArray.Add(strLink); SysFreeString(bstrHref); } pAnchElem->Release(); } pDisp2->Release(); } MessageBox(sURLArray.GetAt(i)); } } pColl->Release(); } pHtmlDoc->Release(); }
Taking a look now at the code. I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets. Any help is appreciated Thanks
So the value of
nElem
is simply less than it should be?Mohammed Tarik wrote:
I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.
Since
nElem
is the culprit, everything below it is superfluous."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
So the value of
nElem
is simply less than it should be?Mohammed Tarik wrote:
I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.
Since
nElem
is the culprit, everything below it is superfluous."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
Yes, and sometimes it's zero.
-
Yes, and sometimes it's zero.
What URL are you using? I just tried it with a few and
get_length()
returned the correct value."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
What URL are you using? I just tried it with a few and
get_length()
returned the correct value."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!
-
I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!
Mohammed Tarik wrote:
The first one returns some URLs while the second doesn't return any !!
Do the anchors have a
name
orid
attribute?"Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
Mohammed Tarik wrote:
The first one returns some URLs while the second doesn't return any !!
Do the anchors have a
name
orid
attribute?"Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.
-
No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.
Mohammed Tarik wrote:
No...
get_anchors()
only returns those that do."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
Mohammed Tarik wrote:
No...
get_anchors()
only returns those that do."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
So, in order to return all the links on a page regardless of name or id attributes, I have to write my own code? Do you know of any other function that can do this? Thanks a lot for your help
-
Mohammed Tarik wrote:
No...
get_anchors()
only returns those that do."Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
I found that
get_links
can do it. Thanks again. -
I found that
get_links
can do it. Thanks again.Doesn't that return
LINK
andAREA
elements?"Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
-
Doesn't that return
LINK
andAREA
elements?"Love people and use things, not love things and use people." - Unknown
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
Well, at the beginning of my first post, I mentioned that I need to get all links on a web page. I chose
get_anchors()
because I found a sentence mentioning it in one of CP articles where the author said that it is used to get hyperlinks. I guess thatget_links()
would do a better job for me after some processing. Sorry for the inconvenience. :-O