Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Getting XHTML tags by tag name

Getting XHTML tags by tag name

Scheduled Pinned Locked Moved C#
csharpjavascriptcomquestion
7 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jordanwb
    wrote on last edited by
    #1

    In JavaScript you can run document.getElementsByTagName ("img") to get all of the image tags. Can you do something similar in C#? Also you can do this is JavaScript:

    var image = document.createElement ("img");
    image.getAttribute ("src");

    Again can you do something similar in C#? And if so, how? This is how I'm getting the webpage: http://www.tech-recipes.com/rx/1954/get_web_page_contents_in_code_with_csharp/[^] Thanks.

    L 1 Reply Last reply
    0
    • J Jordanwb

      In JavaScript you can run document.getElementsByTagName ("img") to get all of the image tags. Can you do something similar in C#? Also you can do this is JavaScript:

      var image = document.createElement ("img");
      image.getAttribute ("src");

      Again can you do something similar in C#? And if so, how? This is how I'm getting the webpage: http://www.tech-recipes.com/rx/1954/get_web_page_contents_in_code_with_csharp/[^] Thanks.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      If it's really XHTML and not just plain old HTML, you could just treat it as xml: use XmlDocument and its SelectNodes(string xpath) function At least, that's what I would do.. The xpath would be something like "//img/@src" I think (if you want all src attributes of all img's as your code seems to do)

      J 1 Reply Last reply
      0
      • L Lost User

        If it's really XHTML and not just plain old HTML, you could just treat it as xml: use XmlDocument and its SelectNodes(string xpath) function At least, that's what I would do.. The xpath would be something like "//img/@src" I think (if you want all src attributes of all img's as your code seems to do)

        J Offline
        J Offline
        Jordanwb
        wrote on last edited by
        #3

        Okay I'll give that a try. I've found out that the XHTML isn't completely valid. Some of the tags aren't closed properly. Here's in excerpt:

        <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
        <meta name="robots" content="noarchive"/>
        <meta name="description" content="/a/ is 4chan's imageboard dedicated to the discussion of Japanese anime and manga."/>
        <meta name="keywords" content="imageboard,japan,anime,manga"/><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/yotsuba.9.css" title="Yotsuba"><link rel="stylesheet" type="text/css" href="http://zip.4chan.org/yotsublue.9.css" title="Yotsuba B"><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/futaba.9.css" title="Futaba"><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/burichan.9.css" title="Burichan"><link rel="alternate" title="RSS feed" href="/a/index.rss" type="application/rss+xml" /><title>/a/ - Animu & Mango</title>

        While some of the tags are somewhat formed properly, some aren't. The first one is easy:

        result.Replace ("\"/>", "\" />");

        I think I could use regex for the tags missing a closing "/" but I don't know how to do that. [Edit] Okay the result.Replace bit isn't working.

        modified on Thursday, June 25, 2009 1:18 PM

        L 1 Reply Last reply
        0
        • J Jordanwb

          Okay I'll give that a try. I've found out that the XHTML isn't completely valid. Some of the tags aren't closed properly. Here's in excerpt:

          <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
          <meta name="robots" content="noarchive"/>
          <meta name="description" content="/a/ is 4chan's imageboard dedicated to the discussion of Japanese anime and manga."/>
          <meta name="keywords" content="imageboard,japan,anime,manga"/><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/yotsuba.9.css" title="Yotsuba"><link rel="stylesheet" type="text/css" href="http://zip.4chan.org/yotsublue.9.css" title="Yotsuba B"><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/futaba.9.css" title="Futaba"><link rel="alternate stylesheet" type="text/css" href="http://zip.4chan.org/burichan.9.css" title="Burichan"><link rel="alternate" title="RSS feed" href="/a/index.rss" type="application/rss+xml" /><title>/a/ - Animu & Mango</title>

          While some of the tags are somewhat formed properly, some aren't. The first one is easy:

          result.Replace ("\"/>", "\" />");

          I think I could use regex for the tags missing a closing "/" but I don't know how to do that. [Edit] Okay the result.Replace bit isn't working.

          modified on Thursday, June 25, 2009 1:18 PM

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          I'm afraid you may have to use a custom parser, for HTML.. That will work, but it's a lot of work to make.

          J 1 Reply Last reply
          0
          • L Lost User

            I'm afraid you may have to use a custom parser, for HTML.. That will work, but it's a lot of work to make.

            J Offline
            J Offline
            Jordanwb
            wrote on last edited by
            #5

            Well all of the HTML seems to be properly nested as per the XHTML specs, but some of the tags simply aren't closed properly. All I may need to do is C#'s version of PHP's preg_replace function.

            L 1 Reply Last reply
            0
            • J Jordanwb

              Well all of the HTML seems to be properly nested as per the XHTML specs, but some of the tags simply aren't closed properly. All I may need to do is C#'s version of PHP's preg_replace function.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              If you can get that to work then it's probably less work, but it may not be as robust. Up to you though :)

              J 1 Reply Last reply
              0
              • L Lost User

                If you can get that to work then it's probably less work, but it may not be as robust. Up to you though :)

                J Offline
                J Offline
                Jordanwb
                wrote on last edited by
                #7

                I found more malformed HTML on other boards. It seems that my program will be significantly more complicated than I though. :( Putting a XHTML transitional doctype creates 431 errors just on one thread alone.

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups