Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Need To Create a crawler/spider in vc++

Need To Create a crawler/spider in vc++

Scheduled Pinned Locked Moved C / C++ / MFC
c++visual-studio
19 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • CPalliniC CPallini

    Wow, starting the working day with a smile is very good, my five. :)

    If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
    This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
    [My articles]

    A Offline
    A Offline
    Ash_VCPP
    wrote on last edited by
    #6

    Do you have any idea about crawler if yes then please provide me the way to start working its urgent...... :-O

    Thanks A Ton Ash_VCPP

    CPalliniC 1 Reply Last reply
    0
    • A Ash_VCPP

      Do you have any idea about crawler if yes then please provide me the way to start working its urgent...... :-O

      Thanks A Ton Ash_VCPP

      CPalliniC Online
      CPalliniC Online
      CPallini
      wrote on last edited by
      #7

      Ash_VCPP wrote:

      Do you have any idea about crawler

      Yes.

      Ash_VCPP wrote:

      then please provide me the way to start working its urgent......

      Sorry, *urgent* questions automatically falls to the bottom of the stack (just a bit above *very urgent* questions). :)

      If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
      This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
      [My articles]

      In testa che avete, signor di Ceprano?

      A 1 Reply Last reply
      0
      • A Ash_VCPP

        Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

        Thanks A Ton Ash_VCPP

        I Offline
        I Offline
        Iain Clarke Warrior Programmer
        wrote on last edited by
        #8

        As you may have seen from your response, it's not a very good question. 1/ You haven't actually asked a question - you've just told us you have work to do. While we are, of course, very happy for you, there's not much to answer. 2/ You've got quite a bit challenge, especially if your starting from scratch. 3/ You can break it down into several challenges... Handling delays, timeouts, gettinf HTPP pages, parsing them into links, etc. I've attached below some code I wrote years ago, grabbing a certain page from a specific URL every hour or so - an early RSS reader, essentially. It may help you with your search terms. There are other articles on codeproject grabbing information from web pages. John Simmons wrote one recently scraping information from a codeproject page. Good luck with your task! Iain.

        DWORD WINAPI UpdatePageThread ( LPVOID lpParameter )
        {
        HWND hWnd = (HWND)lpParameter;

        DWORD dw, dwDelay = 100;
        HINTERNET	hInternet, hIConnect, hIRequest;
        BOOL	bSuccess;
        DWORD	dwStatus, dwSize, dwIndex;
        
        PCHAR	AcceptTypes \[\] = { "text/\*", NULL };
        
        // Set up the query.
        hInternet	= NULL;
        hIConnect	= NULL;
        hIRequest	= NULL;
        hInternet = ::InternetOpen ("OC UK Notify", INTERNET\_OPEN\_TYPE\_PRECONFIG, NULL, NULL, 0);
        
        if (hInternet)
        	hIConnect = ::InternetConnect (hInternet, "www.overclock-uk.net", INTERNET\_DEFAULT\_HTTP\_PORT, "user", "pass", INTERNET\_SERVICE\_HTTP, 0, 1);
        if (hIConnect)
        {
        	hIRequest = ::HttpOpenRequest (hIConnect, NULL, "update.ocuk", NULL, NULL, (const char \*\*)AcceptTypes,
        		INTERNET\_FLAG\_NO\_CACHE\_WRITE | INTERNET\_FLAG\_NO\_COOKIES | INTERNET\_FLAG\_NO\_UI | INTERNET\_FLAG\_RELOAD | INTERNET\_FLAG\_NO\_AUTH,
        		1);
        }
        
        if (!hIRequest) // Raise an error?
        	return 1;
        
        char	buf \[4096\];
        std::string	Page;
        
        while (1)
        {
        	dw = WaitForSingleObject (g\_hEventStop, dwDelay);
        	if (dw != WAIT\_TIMEOUT)
        		break;
        

        // dwDelay = 30000; // Wait a minute before we try again.
        dwDelay = 90 * 60000; // 3/2 hours.

        	bSuccess = ::HttpSendRequest (hIRequest, NULL, 0, NULL, 0);
        	if (!bSuccess)
        		continue; // Try again in a while.
        
        	dwSize = sizeof (DWORD);
        	dwIndex = 0;
        	bSuccess = ::HttpQueryInfo (hIRequest, HTTP\_QUERY\_STATUS\_CODE | HTTP\_QUERY\_FLAG\_NUMBER, &dwStatus, &dwSize, &dwIndex);
        	if (!bSuccess)
        		continue;
        	dwStatus /= 100; // Just get the 2XX part.
        	if (dwStatus != 2)
        		continue;
        
        	Page.erase ();
        
        	while (1)
        	{
        		memset (buf, 0, sizeof (buf));
        		bSuccess = ::InternetReadFile (hIRequest, buf, sizeof (buf), &dw
        
        A 1 Reply Last reply
        0
        • I Iain Clarke Warrior Programmer

          As you may have seen from your response, it's not a very good question. 1/ You haven't actually asked a question - you've just told us you have work to do. While we are, of course, very happy for you, there's not much to answer. 2/ You've got quite a bit challenge, especially if your starting from scratch. 3/ You can break it down into several challenges... Handling delays, timeouts, gettinf HTPP pages, parsing them into links, etc. I've attached below some code I wrote years ago, grabbing a certain page from a specific URL every hour or so - an early RSS reader, essentially. It may help you with your search terms. There are other articles on codeproject grabbing information from web pages. John Simmons wrote one recently scraping information from a codeproject page. Good luck with your task! Iain.

          DWORD WINAPI UpdatePageThread ( LPVOID lpParameter )
          {
          HWND hWnd = (HWND)lpParameter;

          DWORD dw, dwDelay = 100;
          HINTERNET	hInternet, hIConnect, hIRequest;
          BOOL	bSuccess;
          DWORD	dwStatus, dwSize, dwIndex;
          
          PCHAR	AcceptTypes \[\] = { "text/\*", NULL };
          
          // Set up the query.
          hInternet	= NULL;
          hIConnect	= NULL;
          hIRequest	= NULL;
          hInternet = ::InternetOpen ("OC UK Notify", INTERNET\_OPEN\_TYPE\_PRECONFIG, NULL, NULL, 0);
          
          if (hInternet)
          	hIConnect = ::InternetConnect (hInternet, "www.overclock-uk.net", INTERNET\_DEFAULT\_HTTP\_PORT, "user", "pass", INTERNET\_SERVICE\_HTTP, 0, 1);
          if (hIConnect)
          {
          	hIRequest = ::HttpOpenRequest (hIConnect, NULL, "update.ocuk", NULL, NULL, (const char \*\*)AcceptTypes,
          		INTERNET\_FLAG\_NO\_CACHE\_WRITE | INTERNET\_FLAG\_NO\_COOKIES | INTERNET\_FLAG\_NO\_UI | INTERNET\_FLAG\_RELOAD | INTERNET\_FLAG\_NO\_AUTH,
          		1);
          }
          
          if (!hIRequest) // Raise an error?
          	return 1;
          
          char	buf \[4096\];
          std::string	Page;
          
          while (1)
          {
          	dw = WaitForSingleObject (g\_hEventStop, dwDelay);
          	if (dw != WAIT\_TIMEOUT)
          		break;
          

          // dwDelay = 30000; // Wait a minute before we try again.
          dwDelay = 90 * 60000; // 3/2 hours.

          	bSuccess = ::HttpSendRequest (hIRequest, NULL, 0, NULL, 0);
          	if (!bSuccess)
          		continue; // Try again in a while.
          
          	dwSize = sizeof (DWORD);
          	dwIndex = 0;
          	bSuccess = ::HttpQueryInfo (hIRequest, HTTP\_QUERY\_STATUS\_CODE | HTTP\_QUERY\_FLAG\_NUMBER, &dwStatus, &dwSize, &dwIndex);
          	if (!bSuccess)
          		continue;
          	dwStatus /= 100; // Just get the 2XX part.
          	if (dwStatus != 2)
          		continue;
          
          	Page.erase ();
          
          	while (1)
          	{
          		memset (buf, 0, sizeof (buf));
          		bSuccess = ::InternetReadFile (hIRequest, buf, sizeof (buf), &dw
          
          A Offline
          A Offline
          Ash_VCPP
          wrote on last edited by
          #9

          Hi Iain, Thanks for providing this important information and code, now i will try in this way and if found any difficulties then i will let you know...once again thanks for the reply..

          Thanks A Ton Ash_VCPP

          I 1 Reply Last reply
          0
          • CPalliniC CPallini

            Ash_VCPP wrote:

            Do you have any idea about crawler

            Yes.

            Ash_VCPP wrote:

            then please provide me the way to start working its urgent......

            Sorry, *urgent* questions automatically falls to the bottom of the stack (just a bit above *very urgent* questions). :)

            If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
            This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
            [My articles]

            A Offline
            A Offline
            Ash_VCPP
            wrote on last edited by
            #10

            then can you please provide me any code , guidelines or any url where i can get some useful things.......

            Thanks A Ton Ash_VCPP

            1 Reply Last reply
            0
            • A Ash_VCPP

              Hi Iain, Thanks for providing this important information and code, now i will try in this way and if found any difficulties then i will let you know...once again thanks for the reply..

              Thanks A Ton Ash_VCPP

              I Offline
              I Offline
              Iain Clarke Warrior Programmer
              wrote on last edited by
              #11

              The website / page this code pointed to has long since gone, by the way! And take the error checking with heavy skepticism... Iain.

              Codeproject MVP for C++, I can't believe it's for my lounge posts...

              1 Reply Last reply
              0
              • A Ash_VCPP

                Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

                Thanks A Ton Ash_VCPP

                S Offline
                S Offline
                Sandeep Saini SRE
                wrote on last edited by
                #12

                Hi Ash, You still need the code? If yes then please let me know.

                A 1 Reply Last reply
                0
                • S Sandeep Saini SRE

                  Hi Ash, You still need the code? If yes then please let me know.

                  A Offline
                  A Offline
                  Ash_VCPP
                  wrote on last edited by
                  #13

                  hi sandeep, Actually with code i also need to do some planning as i have to start the project from the scratch.....so please provide me the idea as well with the code that which way would be the better one.......

                  Thanks A Ton Ash_VCPP

                  1 Reply Last reply
                  0
                  • A Ash_VCPP

                    Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

                    Thanks A Ton Ash_VCPP

                    D Offline
                    D Offline
                    David Crow
                    wrote on last edited by
                    #14

                    Ash_VCPP wrote:

                    I have an urgent requirement to create a crawler...

                    Care to define this?

                    "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                    "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                    A 1 Reply Last reply
                    0
                    • D David Crow

                      Ash_VCPP wrote:

                      I have an urgent requirement to create a crawler...

                      Care to define this?

                      "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                      "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                      A Offline
                      A Offline
                      Ash_VCPP
                      wrote on last edited by
                      #15

                      i got your point till some extent but i would be pleased if you can explain it more...

                      Thanks A Ton Ash_VCPP

                      D 1 Reply Last reply
                      0
                      • A Ash_VCPP

                        i got your point till some extent but i would be pleased if you can explain it more...

                        Thanks A Ton Ash_VCPP

                        D Offline
                        D Offline
                        David Crow
                        wrote on last edited by
                        #16

                        Ash_VCPP wrote:

                        ...i would be pleased if you can explain it more...

                        I believe that was the question I posed to you. The term "crawler" can take on several different meanings. What is yours?

                        "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                        "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                        A 1 Reply Last reply
                        0
                        • D David Crow

                          Ash_VCPP wrote:

                          ...i would be pleased if you can explain it more...

                          I believe that was the question I posed to you. The term "crawler" can take on several different meanings. What is yours?

                          "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                          "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                          A Offline
                          A Offline
                          Ash_VCPP
                          wrote on last edited by
                          #17

                          basically i need an exe which can fetch data from any url and dump it to data base.....

                          Thanks A Ton Ash_VCPP

                          D 1 Reply Last reply
                          0
                          • A Ash_VCPP

                            basically i need an exe which can fetch data from any url and dump it to data base.....

                            Thanks A Ton Ash_VCPP

                            D Offline
                            D Offline
                            David Crow
                            wrote on last edited by
                            #18

                            Ash_VCPP wrote:

                            ...fetch data from any url...

                            Such as URLDownloadToFile()?

                            "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                            "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                            A 1 Reply Last reply
                            0
                            • D David Crow

                              Ash_VCPP wrote:

                              ...fetch data from any url...

                              Such as URLDownloadToFile()?

                              "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                              "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                              A Offline
                              A Offline
                              Ash_VCPP
                              wrote on last edited by
                              #19

                              i am not sure that it will work...coz i remember that before few months i used it to download an xml file from server and icons.....

                              Thanks A Ton Ash_VCPP

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups