Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Problem with downloading html from the web with httpwebrequest object

Problem with downloading html from the web with httpwebrequest object

Scheduled Pinned Locked Moved C#
csharphtmldotnetwpfcom
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    Haim Nachum
    wrote on last edited by
    #1

    hi. im trying to download html text from 'amazon.com' using this method: HttpWebRequest hRequest = (HttpWebRequest)WebRequest.Create(url); hRequest.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, */*"; hRequest.ContentType = "application/x-www-form-urlencoded"; hRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; InfoPath.2)"; hRequest.Headers.Add("Accept-Encoding", "gzip, deflate"); hRequest.Headers.Add("UA-CPU", "x86"); hRequest.Method = "GET"; HttpWebResponse hResponse = (HttpWebResponse)hRequest.GetResponse(); StreamReader s =new StreamReader(hResponse.GetResponseStream(),Encoding.GetEncoding(hResponse.CharacterSet)); string page = s.ReadToEnd(); i know that amazon uses character set of "iso-8859-1", thats also returned by the httpwebresponse.characterset property. but for some reason when i examine the string it contains scrambled charecters, so when i want to search that text using all sort of string methods it dosent work. however if i use the "webclinet" object downloadstring method to retrieve the page it shows up fine, but it also takes him 30 sec to get the string! i dont know if its like that because of a heavy processing or something else, but its not flexible enough and dosent answer my needs. anyone have an idea why im getting an invalid string?

    L 1 Reply Last reply
    0
    • H Haim Nachum

      hi. im trying to download html text from 'amazon.com' using this method: HttpWebRequest hRequest = (HttpWebRequest)WebRequest.Create(url); hRequest.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, */*"; hRequest.ContentType = "application/x-www-form-urlencoded"; hRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; InfoPath.2)"; hRequest.Headers.Add("Accept-Encoding", "gzip, deflate"); hRequest.Headers.Add("UA-CPU", "x86"); hRequest.Method = "GET"; HttpWebResponse hResponse = (HttpWebResponse)hRequest.GetResponse(); StreamReader s =new StreamReader(hResponse.GetResponseStream(),Encoding.GetEncoding(hResponse.CharacterSet)); string page = s.ReadToEnd(); i know that amazon uses character set of "iso-8859-1", thats also returned by the httpwebresponse.characterset property. but for some reason when i examine the string it contains scrambled charecters, so when i want to search that text using all sort of string methods it dosent work. however if i use the "webclinet" object downloadstring method to retrieve the page it shows up fine, but it also takes him 30 sec to get the string! i dont know if its like that because of a heavy processing or something else, but its not flexible enough and dosent answer my needs. anyone have an idea why im getting an invalid string?

      L Offline
      L Offline
      leppie
      wrote on last edited by
      #2

      Did you try 'HtmlDecode' the read text?

      xacc.ide - now with TabsToSpaces support
      IronScheme - 1.0 beta 1 - out now!
      ((lambda (x) `((lambda (x) ,x) ',x)) '`((lambda (x) ,x) ',x))

      H 1 Reply Last reply
      0
      • L leppie

        Did you try 'HtmlDecode' the read text?

        xacc.ide - now with TabsToSpaces support
        IronScheme - 1.0 beta 1 - out now!
        ((lambda (x) `((lambda (x) ,x) ',x)) '`((lambda (x) ,x) ',x))

        H Offline
        H Offline
        Haim Nachum
        wrote on last edited by
        #3

        from my undersatanding the htmldecode method just replaces encoded characters such as "<" and so on to an html characters. thats not the issue in my case. but thanks anyway

        R 1 Reply Last reply
        0
        • H Haim Nachum

          from my undersatanding the htmldecode method just replaces encoded characters such as "<" and so on to an html characters. thats not the issue in my case. but thanks anyway

          R Offline
          R Offline
          RGiroux32
          wrote on last edited by
          #4

          Probably figured this out already but: // read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { count = resStream.Read(buf, 0, buf.Length); if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); Cheers, RG

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups