Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. searching web - automated

searching web - automated

Scheduled Pinned Locked Moved C / C++ / MFC
databasecomalgorithmsquestion
2 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    Force Code
    wrote on last edited by
    #1

    Trying to search the web in a loop, hitting the search engine with a new query with each iteration (with thousands of terms for this particular application). Don't know frankly if that's even allowed, as google kicks you off if you try to do it. But the following code below uses the COM IWebBrowser2 interface to an open InternetExplorer. Don't know if that is considered antiquated or there's a faster way to do it, because right now I'm just getting about a page per second. Is there a faster way than the following:

    SHDocVw::IWebBrowser2Ptr brwsr = NULL;
    .
    .
    .
    brwsr = SHDocVw::IWebBrowser2Ptr(spDisp);

    .
    .
    .

    strcpy(srch_engn,"http://www.mysearch.com/search/GGmain.jhtml?searchfor");
    sprintf(szURL,"%s=%s+%%22",srch_engn,q_misc);
    char* szSrch = szURL+strlen(szURL);

    while (1) {

    strcpy(szSrch,getNextSrch(...));	    
        
    BSTR bstr = ConvertStringToBSTR(szURL);
    
    brwsr->Navigate(bstr);
    
    VARIANT\_BOOL busy;
    READYSTATE rs;
    while(1) {
      Sleep(200);
      brwsr->get\_Busy(&busy);	
      if (!busy) break;
      brwsr->get\_ReadyState(&rs);	
      if (rs == READYSTATE\_LOADED) break;
      if (rs == READYSTATE\_COMPLETE) break;
    }
    
    brwsr->Stop();
    
    .
    .
    .
    

    }

    F 1 Reply Last reply
    0
    • F Force Code

      Trying to search the web in a loop, hitting the search engine with a new query with each iteration (with thousands of terms for this particular application). Don't know frankly if that's even allowed, as google kicks you off if you try to do it. But the following code below uses the COM IWebBrowser2 interface to an open InternetExplorer. Don't know if that is considered antiquated or there's a faster way to do it, because right now I'm just getting about a page per second. Is there a faster way than the following:

      SHDocVw::IWebBrowser2Ptr brwsr = NULL;
      .
      .
      .
      brwsr = SHDocVw::IWebBrowser2Ptr(spDisp);

      .
      .
      .

      strcpy(srch_engn,"http://www.mysearch.com/search/GGmain.jhtml?searchfor");
      sprintf(szURL,"%s=%s+%%22",srch_engn,q_misc);
      char* szSrch = szURL+strlen(szURL);

      while (1) {

      strcpy(szSrch,getNextSrch(...));	    
          
      BSTR bstr = ConvertStringToBSTR(szURL);
      
      brwsr->Navigate(bstr);
      
      VARIANT\_BOOL busy;
      READYSTATE rs;
      while(1) {
        Sleep(200);
        brwsr->get\_Busy(&busy);	
        if (!busy) break;
        brwsr->get\_ReadyState(&rs);	
        if (rs == READYSTATE\_LOADED) break;
        if (rs == READYSTATE\_COMPLETE) break;
      }
      
      brwsr->Stop();
      
      .
      .
      .
      

      }

      F Offline
      F Offline
      Force Code
      wrote on last edited by
      #2

      After the Stop statement above, here's how I process the returned page (as I guess that might be the bottleneck):

      if (rsltDoc = IHTMLDocument2Ptr(brwsr->GetDocument())) {
      HRESULT hr = rsltDoc->get_body(&body);
      if (!FAILED(hr)) {
      BSTR bstr_b;
      hr = body->get_innerText(&bstr_b);
      if (!FAILED(hr)) {
      char* szBody = ConvertBSTRToString(bstr_b);
      if (!strstr(szBody,szNotFound)) {
      pf_fmt(szSrch);
      fprintf(f,"%s\n\n",szSrch);
      printf("%s\n\n",szSrch);
      bMatch = TRUE;
      }
      delete [] szBody;
      SysFreeString(bstr_b);
      }
      }
      }

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups