WebResponse/WebRequest problem
-
The following code loads the contents of a web page (Page) into string PageText:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(**Page**, true)); request.Method = "GET"; HttpWebResponse response = (HttpWebResponse) request.GetResponse(); Stream responseStream = response.GetResponseStream(); StreamReader reader = new StreamReader(responseStream); string **PageText** = reader.ReadToEnd();
The problem I have is that the page I receive comes back with encoding that doesn't show up correctly in a text box. For example a single quote ’ comes back as & # 8 2 1 7 ; (without the spaces). I think that this would be correctly displayed in a HTML viewer, but how do I get it to display correctly in a text box? Putting an encoding in the constructor for the StreamReader seems to make no difference. Thanks in advance, Keith :) -
The following code loads the contents of a web page (Page) into string PageText:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(**Page**, true)); request.Method = "GET"; HttpWebResponse response = (HttpWebResponse) request.GetResponse(); Stream responseStream = response.GetResponseStream(); StreamReader reader = new StreamReader(responseStream); string **PageText** = reader.ReadToEnd();
The problem I have is that the page I receive comes back with encoding that doesn't show up correctly in a text box. For example a single quote ’ comes back as & # 8 2 1 7 ; (without the spaces). I think that this would be correctly displayed in a HTML viewer, but how do I get it to display correctly in a text box? Putting an encoding in the constructor for the StreamReader seems to make no difference. Thanks in advance, Keith :)Would the System.Web.HttpUtility.HtmlDecode() method work?
-
The following code loads the contents of a web page (Page) into string PageText:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(new Uri(**Page**, true)); request.Method = "GET"; HttpWebResponse response = (HttpWebResponse) request.GetResponse(); Stream responseStream = response.GetResponseStream(); StreamReader reader = new StreamReader(responseStream); string **PageText** = reader.ReadToEnd();
The problem I have is that the page I receive comes back with encoding that doesn't show up correctly in a text box. For example a single quote ’ comes back as & # 8 2 1 7 ; (without the spaces). I think that this would be correctly displayed in a HTML viewer, but how do I get it to display correctly in a text box? Putting an encoding in the constructor for the StreamReader seems to make no difference. Thanks in advance, Keith :)Did you tried System.Net.WebClient?
using System.Text; using System.Net; ... WebClient request = new WebClient(); string PageText = Encoding.Default.GetString(request.DownloadData(Page)); // You can change Encoding.Default to Encoding.UTF8 if needed
Cheers, John -
Would the System.Web.HttpUtility.HtmlDecode() method work?
This is a C# application and not ASP.NET code or a WebService. Therefore I don't have access to the System.Web namespace. Sorry, I should have mentioned this first time. Thanks for the response. Keith
-
Did you tried System.Net.WebClient?
using System.Text; using System.Net; ... WebClient request = new WebClient(); string PageText = Encoding.Default.GetString(request.DownloadData(Page)); // You can change Encoding.Default to Encoding.UTF8 if needed
Cheers, JohnI've tried this but the result is exactly the same regardless of the encoding.
-
I've tried this but the result is exactly the same regardless of the encoding.
What happens if you use:
... string pagetext = Encoding.ASCII.GetString(request.DownloadData(Page)); ...
Sorry if you already have tried it... :rolleyes: -
What happens if you use:
... string pagetext = Encoding.ASCII.GetString(request.DownloadData(Page)); ...
Sorry if you already have tried it... :rolleyes:I tried
Encoding.<everything I could think of>.GetString
all to no avail. :( Actually.ASCII
makes things slightly worse! -
I tried
Encoding.<everything I could think of>.GetString
all to no avail. :( Actually.ASCII
makes things slightly worse!It seems that you're getting an unicode encoded page from the web server. Did you tried Encoding.Convert?
using System.Text; ... string pagetext = Encoding.Default.GetString(Encoding.Convert(Encoding.Unicode, Encoding.ASCII, request.DownloadData(Page))); ...
What language is your web page/web server? You may try to change Encoding.Unicode to other encoding like UTF8, and Encoding.ASCII to UTF8 if you're still getting wrong results. I'm using the default encoding to read web pages in english/portuguese, and the results are ok for me. Cheers, John