Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. get_anchors doesn't return all anchors

get_anchors doesn't return all anchors

Scheduled Pinned Locked Moved C / C++ / MFC
c++hardwarehelpannouncement
12 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Mohammad Tarik
    wrote on last edited by
    #1

    Hello, I am trying to get all links on an web page. Unfortunately, the "get_anchors" doesn't seem to retrieve all anchor elements. I have pages full of links and all I get are few or none. I'm using Visual C++ 2003. I have an MFC Dialog application with embedded webbrowser control. Here's my code: (btw, GetDocument function calls IWebBrowser::get_Document() function and it works fine with things other than getting anchors).

    HRESULT hr;
    IHTMLDocument2 \* pHtmlDoc = GetDocument();
    CStringArray sURLArray;
    
    if (pHtmlDoc != NULL)
    {
        IHTMLElementCollection \* pColl = NULL;
        hr = pHtmlDoc->get\_anchors(&pColl);
    		
    	if(SUCCEEDED(hr))
    	{
    		LONG nElem = 0;
    		hr = pColl->get\_length(&nElem);
    			
    		if(SUCCEEDED(hr))
    		{
    			for(long i = 0; i < nElem; i++)
    			{
    				\_variant\_t vIndex(i);
    				   
    				IDispatch \* pDisp2 = NULL;
    				hr = pColl->item(vIndex, vIndex, &pDisp2);
    				if(SUCCEEDED(hr))
    				{
    					IHTMLAnchorElement \* pAnchElem = NULL;
    					hr = pDisp2->QueryInterface(IID\_IHTMLAnchorElement, (void\*\*) &pAnchElem);
    					if(SUCCEEDED(hr))
    					{
    						BSTR bstrHref;
    						if(SUCCEEDED(pAnchElem->get\_href(&bstrHref)))
    						{
    							CString strLink(bstrHref);
    							if(!strLink.IsEmpty())
    								sURLArray.Add(strLink);
    							SysFreeString(bstrHref);
    						}
    						pAnchElem->Release();
    					}
    					pDisp2->Release();
    				}
    				MessageBox(sURLArray.GetAt(i));
    			}
    		}
    		pColl->Release();
    	} 
    	pHtmlDoc->Release();
    }
    

    Taking a look now at the code. I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets. Any help is appreciated Thanks

    D 1 Reply Last reply
    0
    • M Mohammad Tarik

      Hello, I am trying to get all links on an web page. Unfortunately, the "get_anchors" doesn't seem to retrieve all anchor elements. I have pages full of links and all I get are few or none. I'm using Visual C++ 2003. I have an MFC Dialog application with embedded webbrowser control. Here's my code: (btw, GetDocument function calls IWebBrowser::get_Document() function and it works fine with things other than getting anchors).

      HRESULT hr;
      IHTMLDocument2 \* pHtmlDoc = GetDocument();
      CStringArray sURLArray;
      
      if (pHtmlDoc != NULL)
      {
          IHTMLElementCollection \* pColl = NULL;
          hr = pHtmlDoc->get\_anchors(&pColl);
      		
      	if(SUCCEEDED(hr))
      	{
      		LONG nElem = 0;
      		hr = pColl->get\_length(&nElem);
      			
      		if(SUCCEEDED(hr))
      		{
      			for(long i = 0; i < nElem; i++)
      			{
      				\_variant\_t vIndex(i);
      				   
      				IDispatch \* pDisp2 = NULL;
      				hr = pColl->item(vIndex, vIndex, &pDisp2);
      				if(SUCCEEDED(hr))
      				{
      					IHTMLAnchorElement \* pAnchElem = NULL;
      					hr = pDisp2->QueryInterface(IID\_IHTMLAnchorElement, (void\*\*) &pAnchElem);
      					if(SUCCEEDED(hr))
      					{
      						BSTR bstrHref;
      						if(SUCCEEDED(pAnchElem->get\_href(&bstrHref)))
      						{
      							CString strLink(bstrHref);
      							if(!strLink.IsEmpty())
      								sURLArray.Add(strLink);
      							SysFreeString(bstrHref);
      						}
      						pAnchElem->Release();
      					}
      					pDisp2->Release();
      				}
      				MessageBox(sURLArray.GetAt(i));
      			}
      		}
      		pColl->Release();
      	} 
      	pHtmlDoc->Release();
      }
      

      Taking a look now at the code. I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets. Any help is appreciated Thanks

      D Offline
      D Offline
      David Crow
      wrote on last edited by
      #2

      So the value of nElem is simply less than it should be?

      Mohammed Tarik wrote:

      I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.

      Since nElem is the culprit, everything below it is superfluous.

      "Love people and use things, not love things and use people." - Unknown

      "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

      M 1 Reply Last reply
      0
      • D David Crow

        So the value of nElem is simply less than it should be?

        Mohammed Tarik wrote:

        I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.

        Since nElem is the culprit, everything below it is superfluous.

        "Love people and use things, not love things and use people." - Unknown

        "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

        M Offline
        M Offline
        Mohammad Tarik
        wrote on last edited by
        #3

        Yes, and sometimes it's zero.

        D 1 Reply Last reply
        0
        • M Mohammad Tarik

          Yes, and sometimes it's zero.

          D Offline
          D Offline
          David Crow
          wrote on last edited by
          #4

          What URL are you using? I just tried it with a few and get_length() returned the correct value.

          "Love people and use things, not love things and use people." - Unknown

          "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

          M 1 Reply Last reply
          0
          • D David Crow

            What URL are you using? I just tried it with a few and get_length() returned the correct value.

            "Love people and use things, not love things and use people." - Unknown

            "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

            M Offline
            M Offline
            Mohammad Tarik
            wrote on last edited by
            #5

            I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!

            D 1 Reply Last reply
            0
            • M Mohammad Tarik

              I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!

              D Offline
              D Offline
              David Crow
              wrote on last edited by
              #6

              Mohammed Tarik wrote:

              The first one returns some URLs while the second doesn't return any !!

              Do the anchors have a name or id attribute?

              "Love people and use things, not love things and use people." - Unknown

              "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

              M 1 Reply Last reply
              0
              • D David Crow

                Mohammed Tarik wrote:

                The first one returns some URLs while the second doesn't return any !!

                Do the anchors have a name or id attribute?

                "Love people and use things, not love things and use people." - Unknown

                "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                M Offline
                M Offline
                Mohammad Tarik
                wrote on last edited by
                #7

                No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.

                D 1 Reply Last reply
                0
                • M Mohammad Tarik

                  No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.

                  D Offline
                  D Offline
                  David Crow
                  wrote on last edited by
                  #8

                  Mohammed Tarik wrote:

                  No...

                  get_anchors() only returns those that do.

                  "Love people and use things, not love things and use people." - Unknown

                  "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                  M 2 Replies Last reply
                  0
                  • D David Crow

                    Mohammed Tarik wrote:

                    No...

                    get_anchors() only returns those that do.

                    "Love people and use things, not love things and use people." - Unknown

                    "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                    M Offline
                    M Offline
                    Mohammad Tarik
                    wrote on last edited by
                    #9

                    So, in order to return all the links on a page regardless of name or id attributes, I have to write my own code? Do you know of any other function that can do this? Thanks a lot for your help

                    1 Reply Last reply
                    0
                    • D David Crow

                      Mohammed Tarik wrote:

                      No...

                      get_anchors() only returns those that do.

                      "Love people and use things, not love things and use people." - Unknown

                      "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                      M Offline
                      M Offline
                      Mohammad Tarik
                      wrote on last edited by
                      #10

                      I found that get_links can do it. Thanks again.

                      D 1 Reply Last reply
                      0
                      • M Mohammad Tarik

                        I found that get_links can do it. Thanks again.

                        D Offline
                        D Offline
                        David Crow
                        wrote on last edited by
                        #11

                        Doesn't that return LINK and AREA elements?

                        "Love people and use things, not love things and use people." - Unknown

                        "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                        M 1 Reply Last reply
                        0
                        • D David Crow

                          Doesn't that return LINK and AREA elements?

                          "Love people and use things, not love things and use people." - Unknown

                          "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                          M Offline
                          M Offline
                          Mohammad Tarik
                          wrote on last edited by
                          #12

                          Well, at the beginning of my first post, I mentioned that I need to get all links on a web page. I chose get_anchors() because I found a sentence mentioning it in one of CP articles where the author said that it is used to get hyperlinks. I guess that get_links() would do a better job for me after some processing. Sorry for the inconvenience. :-O

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups