Extracting data from the webpages using MFC
-
Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.
-
Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.
See [^]. Moreover, have a look at these CP articles [^] (most of them are
C#
based, but presented techniques may be used as well withC++/MFC
). :)If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles] -
See [^]. Moreover, have a look at these CP articles [^] (most of them are
C#
based, but presented techniques may be used as well withC++/MFC
). :)If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles] -
See [^]. Moreover, have a look at these CP articles [^] (most of them are
C#
based, but presented techniques may be used as well withC++/MFC
). :)If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles] -
kilt wrote:
This has no sense at all. You can do it with 10 lines of code with COM (or UDLTF)
Yes, and with 5 lines of WTF, I suppose. :)
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles] -
Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.
You could navigate the HTML with Microsoft's
IHTMLDocument2
interface. The table you are interested in extracting from is the first<table>
element on that page."Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons
-
Then shouldn't you be really kind and post those 10 lines in a reply to the original poster, instead of whinging about people helping. I think I've just been suckered into troll-feeding... Iain.
I have now moved to Sweden for love (awwww). If you're in Scandinavia and want an MVP on the payroll (or happy with a remote worker), or need contract work done, give me a job! http://cv.imcsoft.co.uk/[^]
-
kilt wrote:
This has no sense at all.
By seeing the kind of trash that you post here, it can be concluded that you have no sense at all.
“Follow your bliss.” – Joseph Campbell
-
Hello All, I have received an assignment for Extracting data from the WebPages, still now I have learnt some basic thing like MFC and Win32 API. Can anyone please suggest me topics which I have to learn before starting this assignment. My Assignment is like for example we have an espn site schedules http://espn.go.com/nfl/teams/schedule?team=dal WK DATE OPPONENT RESULT W-L HI PASSING HI RUSHING HI RECEIVING 1 Sun, Sep 13 @ Tampa Bay W 34-21 1-0 Romo 353 Barber 79 Crayton 135 I have to extract the Schedule details like Team name, Date , Place etc … and store it to some table. Please suggest me some good topics and sites. Thanking you, Naveen Hs.
I have struggled with loading web page sources and parsing them too. This is the way I prefer and it's the easiest one I know:
#include "afxinet.h"
...
BOOL GetPageSource(CString& url, CString& source){
CInternetSession ises;
CFile* file=new CFile();
try{//There might occur a connection error
file=ises.OpenURL(url);//CInternetSession::OpenURL(url) returns a source code in CHttpFile;
}
catch(CInternetException* e){ //If an error occured, show messagebox with errorcode
CString error=L"";
error.Format(L"Connection error!\nError code: %ld",e->m_dwError);
AfxMessageBox(error);
return FALSE;
}
UINT len=1024;
char buf[1024];
source=L"";
while(len>0){
len=file->Read(buf,1024);
if(len>0)source.Append(CString(buf),len);
}
file->Close();
ises.Close();
return TRUE;
}You can use GetPageSource() function to get a page source. For the parsing part, I use regex.
-
You could navigate the HTML with Microsoft's
IHTMLDocument2
interface. The table you are interested in extracting from is the first<table>
element on that page."Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons