Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. How to parse special chars

How to parse special chars

Scheduled Pinned Locked Moved C#
csharpphphtmldatabasecom
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Aljaz111
    wrote on last edited by
    #1

    I have to parse html with charset=iso-8859-2. I am getting html string now like this:

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(http://www.koroska-on.net/index.php?option=com\_glossary&func=display&letter=All&Itemid=0&catid=66&page=1);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    Stream resStream = response.GetResponseStream();

    int count = 0;
    byte[] buffer = new byte[8192];
    do
    {
    count = resStream.Read(buffer, 0, buffer.Length);

                if (count != 0)
                {
    
                    tempString = Encoding.ASCII.GetString(buffer, 0, count);
                    html += tempString;
                }
            }
            while (count > 0);
    

    Now specials are showed like ? or smt, because they aren't in ASCII range. How could i parse it with special chars showed normally? Thanks, Bye

    L L 2 Replies Last reply
    0
    • A Aljaz111

      I have to parse html with charset=iso-8859-2. I am getting html string now like this:

      HttpWebRequest request = (HttpWebRequest)WebRequest.Create(http://www.koroska-on.net/index.php?option=com\_glossary&func=display&letter=All&Itemid=0&catid=66&page=1);
      HttpWebResponse response = (HttpWebResponse)request.GetResponse();

      Stream resStream = response.GetResponseStream();

      int count = 0;
      byte[] buffer = new byte[8192];
      do
      {
      count = resStream.Read(buffer, 0, buffer.Length);

                  if (count != 0)
                  {
      
                      tempString = Encoding.ASCII.GetString(buffer, 0, count);
                      html += tempString;
                  }
              }
              while (count > 0);
      

      Now specials are showed like ? or smt, because they aren't in ASCII range. How could i parse it with special chars showed normally? Thanks, Bye

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Aljaz111 wrote:

      tempString = Encoding.ASCII.GetString(buffer, 0, count);

      You asked for it in ASCII, perhaps you should use Unicode.

      1 Reply Last reply
      0
      • A Aljaz111

        I have to parse html with charset=iso-8859-2. I am getting html string now like this:

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(http://www.koroska-on.net/index.php?option=com\_glossary&func=display&letter=All&Itemid=0&catid=66&page=1);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();

        int count = 0;
        byte[] buffer = new byte[8192];
        do
        {
        count = resStream.Read(buffer, 0, buffer.Length);

                    if (count != 0)
                    {
        
                        tempString = Encoding.ASCII.GetString(buffer, 0, count);
                        html += tempString;
                    }
                }
                while (count > 0);
        

        Now specials are showed like ? or smt, because they aren't in ASCII range. How could i parse it with special chars showed normally? Thanks, Bye

        L Offline
        L Offline
        Luc Pattyn
        wrote on last edited by
        #3

        Seems like ISO 8859-2[^] stands for the Central European character set. So I would go for an 8-bit encoding, most likely new Encoding(12xx) is all it takes. It certainly won't work properly with ASCII, as that only accepts character codes [0, 127]. For you to read[^]. :)

        Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


        Happy New Year to all.
        We hope 2010 soon brings us automatic PRE tags!
        Until then, please insert them manually.


        A 1 Reply Last reply
        0
        • L Luc Pattyn

          Seems like ISO 8859-2[^] stands for the Central European character set. So I would go for an 8-bit encoding, most likely new Encoding(12xx) is all it takes. It certainly won't work properly with ASCII, as that only accepts character codes [0, 127]. For you to read[^]. :)

          Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


          Happy New Year to all.
          We hope 2010 soon brings us automatic PRE tags!
          Until then, please insert them manually.


          A Offline
          A Offline
          Aljaz111
          wrote on last edited by
          #4

          Ok i did it like this:

          Encoding e=Encoding.GetEncoding(1250);

          Still all chars aren't showed as they should be. One of them is showed right..for others I used replace:(. Thanks

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups