Problem with downloading html from the web with httpwebrequest object

Haim Nachum

hi. im trying to download html text from 'amazon.com' using this method: HttpWebRequest hRequest = (HttpWebRequest)WebRequest.Create(url); hRequest.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, */*"; hRequest.ContentType = "application/x-www-form-urlencoded"; hRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; InfoPath.2)"; hRequest.Headers.Add("Accept-Encoding", "gzip, deflate"); hRequest.Headers.Add("UA-CPU", "x86"); hRequest.Method = "GET"; HttpWebResponse hResponse = (HttpWebResponse)hRequest.GetResponse(); StreamReader s =new StreamReader(hResponse.GetResponseStream(),Encoding.GetEncoding(hResponse.CharacterSet)); string page = s.ReadToEnd(); i know that amazon uses character set of "iso-8859-1", thats also returned by the httpwebresponse.characterset property. but for some reason when i examine the string it contains scrambled charecters, so when i want to search that text using all sort of string methods it dosent work. however if i use the "webclinet" object downloadstring method to retrieve the page it shows up fine, but it also takes him 30 sec to get the string! i dont know if its like that because of a heavy processing or something else, but its not flexible enough and dosent answer my needs. anyone have an idea why im getting an invalid string?

leppie

Did you try 'HtmlDecode' the read text?

xacc.ide - now with TabsToSpaces support
IronScheme - 1.0 beta 1 - out now!
((lambda (x) `((lambda (x) ,x) ',x)) '`((lambda (x) ,x) ',x))

Haim Nachum

from my undersatanding the htmldecode method just replaces encoded characters such as "<" and so on to an html characters. thats not the issue in my case. but thanks anyway

RGiroux32

Probably figured this out already but: // read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { count = resStream.Read(buf, 0, buf.Length); if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); Cheers, RG