[C#.NET 2008] Screen-scraping a HTML Page
-
Hi, I'm trying to process the HTML source code from a certain web page. This web page has security enabled, using a Username & a password. I'm using the following method to try to access the page:
public static string GetHtmlPageSource(string url, string username, string password)
{WebClient wc = new WebClient(); wc.Credentials = new NetworkCredential(username, password); try { using (Stream stream = wc.OpenRead(new Uri(url))) { using (StreamReader reader = new StreamReader(stream)) { return reader.ReadToEnd(); } } } catch (WebException e) { //Error handeling return e.ToString(); }
}
This doesn't work however, I seem to be stuck at the logon page. I'm not able to pass the user security. Anyone has an idea?
-
Hi, I'm trying to process the HTML source code from a certain web page. This web page has security enabled, using a Username & a password. I'm using the following method to try to access the page:
public static string GetHtmlPageSource(string url, string username, string password)
{WebClient wc = new WebClient(); wc.Credentials = new NetworkCredential(username, password); try { using (Stream stream = wc.OpenRead(new Uri(url))) { using (StreamReader reader = new StreamReader(stream)) { return reader.ReadToEnd(); } } } catch (WebException e) { //Error handeling return e.ToString(); }
}
This doesn't work however, I seem to be stuck at the logon page. I'm not able to pass the user security. Anyone has an idea?
Do you not think that the site owners put the security there for a reason?
Henry Minute Do not read medical books! You could die of a misprint. - Mark Twain Girl: (staring) "Why do you need an icy cucumber?" “I want to report a fraud. The government is lying to us all.”
-
Do you not think that the site owners put the security there for a reason?
Henry Minute Do not read medical books! You could die of a misprint. - Mark Twain Girl: (staring) "Why do you need an icy cucumber?" “I want to report a fraud. The government is lying to us all.”
Yes I do, it's to prevent unauthorised access to the page... :laugh: However, this is a Web page that exists on the intranet of our company network. It displays certain measures which we want to display on a Dashboard application. The idea is to have an overview of these values at the blink of an eye (the dashboard will be projected on a whiteboard). Normally, we would use a read account on the database for this, but (since the database is being managed by the Netherlands and I am a Belgian employee) the Netherlands are refusing to give us a read account. So this is the only way how we can succeed in building the dashboard.
-
Yes I do, it's to prevent unauthorised access to the page... :laugh: However, this is a Web page that exists on the intranet of our company network. It displays certain measures which we want to display on a Dashboard application. The idea is to have an overview of these values at the blink of an eye (the dashboard will be projected on a whiteboard). Normally, we would use a read account on the database for this, but (since the database is being managed by the Netherlands and I am a Belgian employee) the Netherlands are refusing to give us a read account. So this is the only way how we can succeed in building the dashboard.
Dimitri Backaert wrote:
Normally, we would use a read account on the database for this, but (since the database is being managed by the Netherlands and I am a Belgian employee) the Netherlands are refusing to give us a read account.
:confused: Does the chairman of the company think this is a good thing?
-
Dimitri Backaert wrote:
Normally, we would use a read account on the database for this, but (since the database is being managed by the Netherlands and I am a Belgian employee) the Netherlands are refusing to give us a read account.
:confused: Does the chairman of the company think this is a good thing?
Doncha just luuuv office politics? :laugh: :laugh:
Henry Minute Do not read medical books! You could die of a misprint. - Mark Twain Girl: (staring) "Why do you need an icy cucumber?" “I want to report a fraud. The government is lying to us all.”
-
Doncha just luuuv office politics? :laugh: :laugh:
Henry Minute Do not read medical books! You could die of a misprint. - Mark Twain Girl: (staring) "Why do you need an icy cucumber?" “I want to report a fraud. The government is lying to us all.”
-
Hi, I'm trying to process the HTML source code from a certain web page. This web page has security enabled, using a Username & a password. I'm using the following method to try to access the page:
public static string GetHtmlPageSource(string url, string username, string password)
{WebClient wc = new WebClient(); wc.Credentials = new NetworkCredential(username, password); try { using (Stream stream = wc.OpenRead(new Uri(url))) { using (StreamReader reader = new StreamReader(stream)) { return reader.ReadToEnd(); } } } catch (WebException e) { //Error handeling return e.ToString(); }
}
This doesn't work however, I seem to be stuck at the logon page. I'm not able to pass the user security. Anyone has an idea?
Dimitri Backaert wrote:
This doesn't work however, I seem to be stuck at the logon page. I'm not able to pass the user security. Anyone has an idea?
It would seem that WebClient is not recognizing this page as an authentication request. You will most likely have to manually format the correct response and send it to the server. I would use "WireShark" to trace a manual session with the server. This should let you see what the server expects for an authentication response. James Johnson
-
Henry Minute wrote:
Doncha just luuuv office politics?
I just wonder why the Dutch don't trust the Belgians? :laugh:
This might all seem very funny, I know, but it doesn't solve my problem.... :doh: Concerning the chairman, A request has been sent by my manager. But this might take a while to achieve (office / political war between Belgium / Netherlands). The main reason why the Netherlands are unwilling to release information, is that they were formerly the only group that maintained all the ICT infrastructure for the whole Benelux company group. However, since the beginning of 2009, Belgium and Luxemburg splitted from the Netherlands, and created their own ICT division. I think it's some sort of Job Protection... Politics, money, it's all involved. And it's - excuse my language - a real pain in the ass...
-
Dimitri Backaert wrote:
This doesn't work however, I seem to be stuck at the logon page. I'm not able to pass the user security. Anyone has an idea?
It would seem that WebClient is not recognizing this page as an authentication request. You will most likely have to manually format the correct response and send it to the server. I would use "WireShark" to trace a manual session with the server. This should let you see what the server expects for an authentication response. James Johnson
James, Thank you for your answer. This could do the trick. I'll try it out. I already was thinking about changing the HttpWebRequest in another method, because in my opinion a WebRequest is the equivalent of a GET. What I need is a POST, so I'm thinking of using an HttpWebResponse...