Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Extracting data from the webpages using MFC

Extracting data from the webpages using MFC

Scheduled Pinned Locked Moved C / C++ / MFC
c++sharepointcomcollaborationjson
11 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Offline
    N Offline
    NaveenHS
    wrote on last edited by
    #1

    Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.

    C D M 3 Replies Last reply
    0
    • N NaveenHS

      Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.

      C Offline
      C Offline
      CPallini
      wrote on last edited by
      #2

      See [^]. Moreover, have a look at these CP articles [^] (most of them are C# based, but presented techniques may be used as well with C++/MFC). :)

      If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
      This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
      [My articles]

      N L 2 Replies Last reply
      0
      • C CPallini

        See [^]. Moreover, have a look at these CP articles [^] (most of them are C# based, but presented techniques may be used as well with C++/MFC). :)

        If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
        This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
        [My articles]

        N Offline
        N Offline
        NaveenHS
        wrote on last edited by
        #3

        Thank you sir very much, i will go through the articles it will be very useful for me.

        1 Reply Last reply
        0
        • C CPallini

          See [^]. Moreover, have a look at these CP articles [^] (most of them are C# based, but presented techniques may be used as well with C++/MFC). :)

          If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
          This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
          [My articles]

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

          C L I R 4 Replies Last reply
          0
          • L Lost User

            This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

            C Offline
            C Offline
            CPallini
            wrote on last edited by
            #5

            kilt wrote:

            This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

            Yes, and with 5 lines of WTF, I suppose. :)

            If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
            This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
            [My articles]

            1 Reply Last reply
            0
            • L Lost User

              This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              kilt wrote:

              You can do it with 10 lines of code

              Interesting that all your responses are like this, except the actual number of line may be different, but you never post the actual code, or a link to where it may be found!

              1 Reply Last reply
              0
              • N NaveenHS

                Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.

                D Offline
                D Offline
                David Crow
                wrote on last edited by
                #7

                You could navigate the HTML with Microsoft's IHTMLDocument2 interface. The table you are interested in extracting from is the first <table> element on that page.

                "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                N 1 Reply Last reply
                0
                • L Lost User

                  This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

                  I Offline
                  I Offline
                  Iain Clarke Warrior Programmer
                  wrote on last edited by
                  #8

                  Then shouldn't you be really kind and post those 10 lines in a reply to the original poster, instead of whinging about people helping. I think I've just been suckered into troll-feeding... Iain.

                  I have now moved to Sweden for love (awwww). If you're in Scandinavia and want an MVP on the payroll (or happy with a remote worker), or need contract work done, give me a job! http://cv.imcsoft.co.uk/[^]

                  1 Reply Last reply
                  0
                  • L Lost User

                    This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)

                    R Offline
                    R Offline
                    Rajesh R Subramanian
                    wrote on last edited by
                    #9

                    kilt wrote:

                    This has no sense at all.

                    By seeing the kind of trash that you post here, it can be concluded that you have no sense at all.

                    “Follow your bliss.” – Joseph Campbell

                    1 Reply Last reply
                    0
                    • N NaveenHS

                      Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.

                      M Offline
                      M Offline
                      msn92
                      wrote on last edited by
                      #10

                      I have struggled with loading web page sources and parsing them too. This is the way I prefer and it's the easiest one I know:

                      #include "afxinet.h"
                      ...
                      BOOL GetPageSource(CString& url, CString& source){
                      CInternetSession ises;
                      CFile* file=new CFile();
                      try{//There might occur a connection error
                      file=ises.OpenURL(url);//CInternetSession::OpenURL(url) returns a source code in CHttpFile;
                      }
                      catch(CInternetException* e){ //If an error occured, show messagebox with errorcode
                      CString error=L"";
                      error.Format(L"Connection error!\nError code: %ld",e->m_dwError);
                      AfxMessageBox(error);
                      return FALSE;
                      }
                      UINT len=1024;
                      char buf[1024];
                      source=L"";
                      while(len>0){
                      len=file->Read(buf,1024);
                      if(len>0)source.Append(CString(buf),len);
                      }
                      file->Close();
                      ises.Close();
                      return TRUE;
                      }

                      You can use GetPageSource() function to get a page source. For the parsing part, I use regex.

                      1 Reply Last reply
                      0
                      • D David Crow

                        You could navigate the HTML with Microsoft's IHTMLDocument2 interface. The table you are interested in extracting from is the first <table> element on that page.

                        "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                        "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                        N Offline
                        N Offline
                        NaveenHS
                        wrote on last edited by
                        #11

                        Thank you very much for the reply David.

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups