Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Need To Create a crawler/spider in vc++

Need To Create a crawler/spider in vc++

Scheduled Pinned Locked Moved C / C++ / MFC
c++visual-studio
19 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Chandrasekharan P

    very good.. now what is the problem??

    A Offline
    A Offline
    Ash_VCPP
    wrote on last edited by
    #5

    I need to decide many things before starting the project,coz i am the only one responsible to make this project, so please tell me the initial guideline to start with,Like what i should use....win32 exe,win32 dll,com etc....which inter process communication logic i should use.....

    Thanks A Ton Ash_VCPP

    1 Reply Last reply
    0
    • CPalliniC CPallini

      Wow, starting the working day with a smile is very good, my five. :)

      If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
      This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
      [My articles]

      A Offline
      A Offline
      Ash_VCPP
      wrote on last edited by
      #6

      Do you have any idea about crawler if yes then please provide me the way to start working its urgent...... :-O

      Thanks A Ton Ash_VCPP

      CPalliniC 1 Reply Last reply
      0
      • A Ash_VCPP

        Do you have any idea about crawler if yes then please provide me the way to start working its urgent...... :-O

        Thanks A Ton Ash_VCPP

        CPalliniC Offline
        CPalliniC Offline
        CPallini
        wrote on last edited by
        #7

        Ash_VCPP wrote:

        Do you have any idea about crawler

        Yes.

        Ash_VCPP wrote:

        then please provide me the way to start working its urgent......

        Sorry, *urgent* questions automatically falls to the bottom of the stack (just a bit above *very urgent* questions). :)

        If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
        This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
        [My articles]

        In testa che avete, signor di Ceprano?

        A 1 Reply Last reply
        0
        • A Ash_VCPP

          Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

          Thanks A Ton Ash_VCPP

          I Offline
          I Offline
          Iain Clarke Warrior Programmer
          wrote on last edited by
          #8

          As you may have seen from your response, it's not a very good question. 1/ You haven't actually asked a question - you've just told us you have work to do. While we are, of course, very happy for you, there's not much to answer. 2/ You've got quite a bit challenge, especially if your starting from scratch. 3/ You can break it down into several challenges... Handling delays, timeouts, gettinf HTPP pages, parsing them into links, etc. I've attached below some code I wrote years ago, grabbing a certain page from a specific URL every hour or so - an early RSS reader, essentially. It may help you with your search terms. There are other articles on codeproject grabbing information from web pages. John Simmons wrote one recently scraping information from a codeproject page. Good luck with your task! Iain.

          DWORD WINAPI UpdatePageThread ( LPVOID lpParameter )
          {
          HWND hWnd = (HWND)lpParameter;

          DWORD dw, dwDelay = 100;
          HINTERNET	hInternet, hIConnect, hIRequest;
          BOOL	bSuccess;
          DWORD	dwStatus, dwSize, dwIndex;
          
          PCHAR	AcceptTypes \[\] = { "text/\*", NULL };
          
          // Set up the query.
          hInternet	= NULL;
          hIConnect	= NULL;
          hIRequest	= NULL;
          hInternet = ::InternetOpen ("OC UK Notify", INTERNET\_OPEN\_TYPE\_PRECONFIG, NULL, NULL, 0);
          
          if (hInternet)
          	hIConnect = ::InternetConnect (hInternet, "www.overclock-uk.net", INTERNET\_DEFAULT\_HTTP\_PORT, "user", "pass", INTERNET\_SERVICE\_HTTP, 0, 1);
          if (hIConnect)
          {
          	hIRequest = ::HttpOpenRequest (hIConnect, NULL, "update.ocuk", NULL, NULL, (const char \*\*)AcceptTypes,
          		INTERNET\_FLAG\_NO\_CACHE\_WRITE | INTERNET\_FLAG\_NO\_COOKIES | INTERNET\_FLAG\_NO\_UI | INTERNET\_FLAG\_RELOAD | INTERNET\_FLAG\_NO\_AUTH,
          		1);
          }
          
          if (!hIRequest) // Raise an error?
          	return 1;
          
          char	buf \[4096\];
          std::string	Page;
          
          while (1)
          {
          	dw = WaitForSingleObject (g\_hEventStop, dwDelay);
          	if (dw != WAIT\_TIMEOUT)
          		break;
          

          // dwDelay = 30000; // Wait a minute before we try again.
          dwDelay = 90 * 60000; // 3/2 hours.

          	bSuccess = ::HttpSendRequest (hIRequest, NULL, 0, NULL, 0);
          	if (!bSuccess)
          		continue; // Try again in a while.
          
          	dwSize = sizeof (DWORD);
          	dwIndex = 0;
          	bSuccess = ::HttpQueryInfo (hIRequest, HTTP\_QUERY\_STATUS\_CODE | HTTP\_QUERY\_FLAG\_NUMBER, &dwStatus, &dwSize, &dwIndex);
          	if (!bSuccess)
          		continue;
          	dwStatus /= 100; // Just get the 2XX part.
          	if (dwStatus != 2)
          		continue;
          
          	Page.erase ();
          
          	while (1)
          	{
          		memset (buf, 0, sizeof (buf));
          		bSuccess = ::InternetReadFile (hIRequest, buf, sizeof (buf), &dw
          
          A 1 Reply Last reply
          0
          • I Iain Clarke Warrior Programmer

            As you may have seen from your response, it's not a very good question. 1/ You haven't actually asked a question - you've just told us you have work to do. While we are, of course, very happy for you, there's not much to answer. 2/ You've got quite a bit challenge, especially if your starting from scratch. 3/ You can break it down into several challenges... Handling delays, timeouts, gettinf HTPP pages, parsing them into links, etc. I've attached below some code I wrote years ago, grabbing a certain page from a specific URL every hour or so - an early RSS reader, essentially. It may help you with your search terms. There are other articles on codeproject grabbing information from web pages. John Simmons wrote one recently scraping information from a codeproject page. Good luck with your task! Iain.

            DWORD WINAPI UpdatePageThread ( LPVOID lpParameter )
            {
            HWND hWnd = (HWND)lpParameter;

            DWORD dw, dwDelay = 100;
            HINTERNET	hInternet, hIConnect, hIRequest;
            BOOL	bSuccess;
            DWORD	dwStatus, dwSize, dwIndex;
            
            PCHAR	AcceptTypes \[\] = { "text/\*", NULL };
            
            // Set up the query.
            hInternet	= NULL;
            hIConnect	= NULL;
            hIRequest	= NULL;
            hInternet = ::InternetOpen ("OC UK Notify", INTERNET\_OPEN\_TYPE\_PRECONFIG, NULL, NULL, 0);
            
            if (hInternet)
            	hIConnect = ::InternetConnect (hInternet, "www.overclock-uk.net", INTERNET\_DEFAULT\_HTTP\_PORT, "user", "pass", INTERNET\_SERVICE\_HTTP, 0, 1);
            if (hIConnect)
            {
            	hIRequest = ::HttpOpenRequest (hIConnect, NULL, "update.ocuk", NULL, NULL, (const char \*\*)AcceptTypes,
            		INTERNET\_FLAG\_NO\_CACHE\_WRITE | INTERNET\_FLAG\_NO\_COOKIES | INTERNET\_FLAG\_NO\_UI | INTERNET\_FLAG\_RELOAD | INTERNET\_FLAG\_NO\_AUTH,
            		1);
            }
            
            if (!hIRequest) // Raise an error?
            	return 1;
            
            char	buf \[4096\];
            std::string	Page;
            
            while (1)
            {
            	dw = WaitForSingleObject (g\_hEventStop, dwDelay);
            	if (dw != WAIT\_TIMEOUT)
            		break;
            

            // dwDelay = 30000; // Wait a minute before we try again.
            dwDelay = 90 * 60000; // 3/2 hours.

            	bSuccess = ::HttpSendRequest (hIRequest, NULL, 0, NULL, 0);
            	if (!bSuccess)
            		continue; // Try again in a while.
            
            	dwSize = sizeof (DWORD);
            	dwIndex = 0;
            	bSuccess = ::HttpQueryInfo (hIRequest, HTTP\_QUERY\_STATUS\_CODE | HTTP\_QUERY\_FLAG\_NUMBER, &dwStatus, &dwSize, &dwIndex);
            	if (!bSuccess)
            		continue;
            	dwStatus /= 100; // Just get the 2XX part.
            	if (dwStatus != 2)
            		continue;
            
            	Page.erase ();
            
            	while (1)
            	{
            		memset (buf, 0, sizeof (buf));
            		bSuccess = ::InternetReadFile (hIRequest, buf, sizeof (buf), &dw
            
            A Offline
            A Offline
            Ash_VCPP
            wrote on last edited by
            #9

            Hi Iain, Thanks for providing this important information and code, now i will try in this way and if found any difficulties then i will let you know...once again thanks for the reply..

            Thanks A Ton Ash_VCPP

            I 1 Reply Last reply
            0
            • CPalliniC CPallini

              Ash_VCPP wrote:

              Do you have any idea about crawler

              Yes.

              Ash_VCPP wrote:

              then please provide me the way to start working its urgent......

              Sorry, *urgent* questions automatically falls to the bottom of the stack (just a bit above *very urgent* questions). :)

              If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
              This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
              [My articles]

              A Offline
              A Offline
              Ash_VCPP
              wrote on last edited by
              #10

              then can you please provide me any code , guidelines or any url where i can get some useful things.......

              Thanks A Ton Ash_VCPP

              1 Reply Last reply
              0
              • A Ash_VCPP

                Hi Iain, Thanks for providing this important information and code, now i will try in this way and if found any difficulties then i will let you know...once again thanks for the reply..

                Thanks A Ton Ash_VCPP

                I Offline
                I Offline
                Iain Clarke Warrior Programmer
                wrote on last edited by
                #11

                The website / page this code pointed to has long since gone, by the way! And take the error checking with heavy skepticism... Iain.

                Codeproject MVP for C++, I can't believe it's for my lounge posts...

                1 Reply Last reply
                0
                • A Ash_VCPP

                  Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

                  Thanks A Ton Ash_VCPP

                  S Offline
                  S Offline
                  Sandeep Saini SRE
                  wrote on last edited by
                  #12

                  Hi Ash, You still need the code? If yes then please let me know.

                  A 1 Reply Last reply
                  0
                  • S Sandeep Saini SRE

                    Hi Ash, You still need the code? If yes then please let me know.

                    A Offline
                    A Offline
                    Ash_VCPP
                    wrote on last edited by
                    #13

                    hi sandeep, Actually with code i also need to do some planning as i have to start the project from the scratch.....so please provide me the idea as well with the code that which way would be the better one.......

                    Thanks A Ton Ash_VCPP

                    1 Reply Last reply
                    0
                    • A Ash_VCPP

                      Hi All, I have an urgent requirement to create a crawler by which i can be able to fetch data from a url, the ide should be vc++.

                      Thanks A Ton Ash_VCPP

                      D Offline
                      D Offline
                      David Crow
                      wrote on last edited by
                      #14

                      Ash_VCPP wrote:

                      I have an urgent requirement to create a crawler...

                      Care to define this?

                      "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                      "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                      A 1 Reply Last reply
                      0
                      • D David Crow

                        Ash_VCPP wrote:

                        I have an urgent requirement to create a crawler...

                        Care to define this?

                        "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                        "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                        A Offline
                        A Offline
                        Ash_VCPP
                        wrote on last edited by
                        #15

                        i got your point till some extent but i would be pleased if you can explain it more...

                        Thanks A Ton Ash_VCPP

                        D 1 Reply Last reply
                        0
                        • A Ash_VCPP

                          i got your point till some extent but i would be pleased if you can explain it more...

                          Thanks A Ton Ash_VCPP

                          D Offline
                          D Offline
                          David Crow
                          wrote on last edited by
                          #16

                          Ash_VCPP wrote:

                          ...i would be pleased if you can explain it more...

                          I believe that was the question I posed to you. The term "crawler" can take on several different meanings. What is yours?

                          "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                          "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                          A 1 Reply Last reply
                          0
                          • D David Crow

                            Ash_VCPP wrote:

                            ...i would be pleased if you can explain it more...

                            I believe that was the question I posed to you. The term "crawler" can take on several different meanings. What is yours?

                            "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                            "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                            A Offline
                            A Offline
                            Ash_VCPP
                            wrote on last edited by
                            #17

                            basically i need an exe which can fetch data from any url and dump it to data base.....

                            Thanks A Ton Ash_VCPP

                            D 1 Reply Last reply
                            0
                            • A Ash_VCPP

                              basically i need an exe which can fetch data from any url and dump it to data base.....

                              Thanks A Ton Ash_VCPP

                              D Offline
                              D Offline
                              David Crow
                              wrote on last edited by
                              #18

                              Ash_VCPP wrote:

                              ...fetch data from any url...

                              Such as URLDownloadToFile()?

                              "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                              "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                              A 1 Reply Last reply
                              0
                              • D David Crow

                                Ash_VCPP wrote:

                                ...fetch data from any url...

                                Such as URLDownloadToFile()?

                                "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                                "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                                A Offline
                                A Offline
                                Ash_VCPP
                                wrote on last edited by
                                #19

                                i am not sure that it will work...coz i remember that before few months i used it to download an xml file from server and icons.....

                                Thanks A Ton Ash_VCPP

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups