Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. get_anchors doesn't return all anchors

get_anchors doesn't return all anchors

Scheduled Pinned Locked Moved C / C++ / MFC
c++hardwarehelpannouncement
12 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Mohammad Tarik

    Hello, I am trying to get all links on an web page. Unfortunately, the "get_anchors" doesn't seem to retrieve all anchor elements. I have pages full of links and all I get are few or none. I'm using Visual C++ 2003. I have an MFC Dialog application with embedded webbrowser control. Here's my code: (btw, GetDocument function calls IWebBrowser::get_Document() function and it works fine with things other than getting anchors).

    HRESULT hr;
    IHTMLDocument2 \* pHtmlDoc = GetDocument();
    CStringArray sURLArray;
    
    if (pHtmlDoc != NULL)
    {
        IHTMLElementCollection \* pColl = NULL;
        hr = pHtmlDoc->get\_anchors(&pColl);
    		
    	if(SUCCEEDED(hr))
    	{
    		LONG nElem = 0;
    		hr = pColl->get\_length(&nElem);
    			
    		if(SUCCEEDED(hr))
    		{
    			for(long i = 0; i < nElem; i++)
    			{
    				\_variant\_t vIndex(i);
    				   
    				IDispatch \* pDisp2 = NULL;
    				hr = pColl->item(vIndex, vIndex, &pDisp2);
    				if(SUCCEEDED(hr))
    				{
    					IHTMLAnchorElement \* pAnchElem = NULL;
    					hr = pDisp2->QueryInterface(IID\_IHTMLAnchorElement, (void\*\*) &pAnchElem);
    					if(SUCCEEDED(hr))
    					{
    						BSTR bstrHref;
    						if(SUCCEEDED(pAnchElem->get\_href(&bstrHref)))
    						{
    							CString strLink(bstrHref);
    							if(!strLink.IsEmpty())
    								sURLArray.Add(strLink);
    							SysFreeString(bstrHref);
    						}
    						pAnchElem->Release();
    					}
    					pDisp2->Release();
    				}
    				MessageBox(sURLArray.GetAt(i));
    			}
    		}
    		pColl->Release();
    	} 
    	pHtmlDoc->Release();
    }
    

    Taking a look now at the code. I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets. Any help is appreciated Thanks

    D Offline
    D Offline
    David Crow
    wrote on last edited by
    #2

    So the value of nElem is simply less than it should be?

    Mohammed Tarik wrote:

    I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.

    Since nElem is the culprit, everything below it is superfluous.

    "Love people and use things, not love things and use people." - Unknown

    "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

    M 1 Reply Last reply
    0
    • D David Crow

      So the value of nElem is simply less than it should be?

      Mohammed Tarik wrote:

      I think that I should have used if(FAILED(...)) return; instead of if(SUCCEEDED(...)) .. it would have been more readable without all these brackets.

      Since nElem is the culprit, everything below it is superfluous.

      "Love people and use things, not love things and use people." - Unknown

      "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

      M Offline
      M Offline
      Mohammad Tarik
      wrote on last edited by
      #3

      Yes, and sometimes it's zero.

      D 1 Reply Last reply
      0
      • M Mohammad Tarik

        Yes, and sometimes it's zero.

        D Offline
        D Offline
        David Crow
        wrote on last edited by
        #4

        What URL are you using? I just tried it with a few and get_length() returned the correct value.

        "Love people and use things, not love things and use people." - Unknown

        "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

        M 1 Reply Last reply
        0
        • D David Crow

          What URL are you using? I just tried it with a few and get_length() returned the correct value.

          "Love people and use things, not love things and use people." - Unknown

          "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

          M Offline
          M Offline
          Mohammad Tarik
          wrote on last edited by
          #5

          I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!

          D 1 Reply Last reply
          0
          • M Mohammad Tarik

            I am testing with "flightcenter.co.uk" and "google.com". The first one returns some URLs while the second doesn't return any !!

            D Offline
            D Offline
            David Crow
            wrote on last edited by
            #6

            Mohammed Tarik wrote:

            The first one returns some URLs while the second doesn't return any !!

            Do the anchors have a name or id attribute?

            "Love people and use things, not love things and use people." - Unknown

            "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

            M 1 Reply Last reply
            0
            • D David Crow

              Mohammed Tarik wrote:

              The first one returns some URLs while the second doesn't return any !!

              Do the anchors have a name or id attribute?

              "Love people and use things, not love things and use people." - Unknown

              "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

              M Offline
              M Offline
              Mohammad Tarik
              wrote on last edited by
              #7

              No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.

              D 1 Reply Last reply
              0
              • M Mohammad Tarik

                No, the URLs returned were strange and I couldn't even find their corresponding anchors. They were like "javascript:submitCJ10.....http://somesite" As you can see from hr = pColl->item(vIndex, vIndex, &pDisp2); I am not trying to put any constraints on the returned URLs.

                D Offline
                D Offline
                David Crow
                wrote on last edited by
                #8

                Mohammed Tarik wrote:

                No...

                get_anchors() only returns those that do.

                "Love people and use things, not love things and use people." - Unknown

                "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                M 2 Replies Last reply
                0
                • D David Crow

                  Mohammed Tarik wrote:

                  No...

                  get_anchors() only returns those that do.

                  "Love people and use things, not love things and use people." - Unknown

                  "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                  M Offline
                  M Offline
                  Mohammad Tarik
                  wrote on last edited by
                  #9

                  So, in order to return all the links on a page regardless of name or id attributes, I have to write my own code? Do you know of any other function that can do this? Thanks a lot for your help

                  1 Reply Last reply
                  0
                  • D David Crow

                    Mohammed Tarik wrote:

                    No...

                    get_anchors() only returns those that do.

                    "Love people and use things, not love things and use people." - Unknown

                    "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                    M Offline
                    M Offline
                    Mohammad Tarik
                    wrote on last edited by
                    #10

                    I found that get_links can do it. Thanks again.

                    D 1 Reply Last reply
                    0
                    • M Mohammad Tarik

                      I found that get_links can do it. Thanks again.

                      D Offline
                      D Offline
                      David Crow
                      wrote on last edited by
                      #11

                      Doesn't that return LINK and AREA elements?

                      "Love people and use things, not love things and use people." - Unknown

                      "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                      M 1 Reply Last reply
                      0
                      • D David Crow

                        Doesn't that return LINK and AREA elements?

                        "Love people and use things, not love things and use people." - Unknown

                        "To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne

                        M Offline
                        M Offline
                        Mohammad Tarik
                        wrote on last edited by
                        #12

                        Well, at the beginning of my first post, I mentioned that I need to get all links on a web page. I chose get_anchors() because I found a sentence mentioning it in one of CP articles where the author said that it is used to get hyperlinks. I guess that get_links() would do a better job for me after some processing. Sorry for the inconvenience. :-O

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups