Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Stripping out HTML code

Stripping out HTML code

Scheduled Pinned Locked Moved ASP.NET
csharphtmlasp-netquestion
5 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W Offline
    W Offline
    wildfiction
    wrote on last edited by
    #1

    Hi, (Using C# in ASP.NET) I have a string that contains about 7K of HTML formatted text. I need to strip out all of the HTML codes and convert the   and > etc. codes to spaces and > etc. Now I can write this myself but I'm guessing that there is already a bit of well worn code out here somewhere that already does this OR it is already built into the ASP.NET library (but I can't find it). Anybody have any pointers for me? The reason that I'm doing this is because I wish to search this chunk of HTML for certain words and don't want to search the HTML codes but just what the user sees. Is there a more elegant approach than that which I'm taking? Thanks! -- modified at 19:48 Wednesday 28th December, 2005

    M I 2 Replies Last reply
    0
    • W wildfiction

      Hi, (Using C# in ASP.NET) I have a string that contains about 7K of HTML formatted text. I need to strip out all of the HTML codes and convert the   and > etc. codes to spaces and > etc. Now I can write this myself but I'm guessing that there is already a bit of well worn code out here somewhere that already does this OR it is already built into the ASP.NET library (but I can't find it). Anybody have any pointers for me? The reason that I'm doing this is because I wish to search this chunk of HTML for certain words and don't want to search the HTML codes but just what the user sees. Is there a more elegant approach than that which I'm taking? Thanks! -- modified at 19:48 Wednesday 28th December, 2005

      M Offline
      M Offline
      Mircea Grelus
      wrote on last edited by
      #2

      My first guess would be to make a regular expression that searches for opening < _AnyCharactersUntillFirst_ > and remove the match. This would strip the HTML tags. Though you will probably have to interpret some cases, for example JavaScript tags: Many people spend their life going to sleep when they’re not sleepy and waking up while they still are.</small></x-turndown>

      W 1 Reply Last reply
      0
      • M Mircea Grelus

        My first guess would be to make a regular expression that searches for opening < _AnyCharactersUntillFirst_ > and remove the match. This would strip the HTML tags. Though you will probably have to interpret some cases, for example JavaScript tags: Many people spend their life going to sleep when they’re not sleepy and waking up while they still are.</small></x-turndown>

        W Offline
        W Offline
        wildfiction
        wrote on last edited by
        #3

        Thanks Mircea!! That will work because I don't have any script in the html. I'm reading through the Regex documentation and I can't find the syntax to use for the < ??? > part of the functions. How do you find <*>? Or how do you specify <*> in a regular expression? Thanks again for your help - much appreciated. -- modified at 20:53 Wednesday 28th December, 2005

        M 1 Reply Last reply
        0
        • W wildfiction

          Hi, (Using C# in ASP.NET) I have a string that contains about 7K of HTML formatted text. I need to strip out all of the HTML codes and convert the   and > etc. codes to spaces and > etc. Now I can write this myself but I'm guessing that there is already a bit of well worn code out here somewhere that already does this OR it is already built into the ASP.NET library (but I can't find it). Anybody have any pointers for me? The reason that I'm doing this is because I wish to search this chunk of HTML for certain words and don't want to search the HTML codes but just what the user sees. Is there a more elegant approach than that which I'm taking? Thanks! -- modified at 19:48 Wednesday 28th December, 2005

          I Offline
          I Offline
          Ista
          wrote on last edited by
          #4

          This might work also All html is a tag. everything So if you loaded the document in to XmlDocument then you can skip to what you want with SelectSingleNode or to grab mutliple tags SelectNodes then to grab the data without tags Node.InnerText will grab it without the tags. That might be closer to your end result with having to do tons of homeade logic You can view the DOM with your locals window XmlDocument x = new XmlDocument(); x.Load(filename); x.Select.... 1 line of code equals many bugs. So don't write any!!

          1 Reply Last reply
          0
          • W wildfiction

            Thanks Mircea!! That will work because I don't have any script in the html. I'm reading through the Regex documentation and I can't find the syntax to use for the < ??? > part of the functions. How do you find <*>? Or how do you specify <*> in a regular expression? Thanks again for your help - much appreciated. -- modified at 20:53 Wednesday 28th December, 2005

            M Offline
            M Offline
            Mircea Grelus
            wrote on last edited by
            #5

            using System.Text.RegularExpressions; Regex r; Match m; r = new Regex(@"<(?<1>[^>]*)>", //regullar expression for catching: < AnyCharUntill > RegexOptions.Singleline | RegexOptions.IgnoreCase/*|RegexOptions.Compiled|*/); for (m = r.Match(input); m.Success; m = m.NextMatch()) //input is the string in which to search. { input = input.Replace(m.Groups[0].ToString(), ""); //m.Groups[0] will reference the entire regular expression meaning "" //m.Groups[0] will refrence backreference 1: "exampletag exampletext" } regards, Mircea Many people spend their life going to sleep when they’re not sleepy and waking up while they still are.

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups