Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Regular expression [modified]

Regular expression [modified]

Scheduled Pinned Locked Moved C#
regexhtmlhelpquestionlounge
5 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E Offline
    E Offline
    Eduard Keilholz
    wrote on last edited by
    #1

    Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?

    Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
    MatchCollection col = regx.Matches(Input);

    .: I love it when a plan comes together :. http://www.zonderpunt.nl

    modified on Friday, February 19, 2010 6:21 AM

    N R 2 Replies Last reply
    0
    • E Eduard Keilholz

      Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?

      Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
      MatchCollection col = regx.Matches(Input);

      .: I love it when a plan comes together :. http://www.zonderpunt.nl

      modified on Friday, February 19, 2010 6:21 AM

      N Offline
      N Offline
      Not Active
      wrote on last edited by
      #2

      Try this tool, Expresso[^]. It has a library of regex and testing environment that may help


      I know the language. I've read a book. - _Madmatt

      E 1 Reply Last reply
      0
      • N Not Active

        Try this tool, Expresso[^]. It has a library of regex and testing environment that may help


        I know the language. I've read a book. - _Madmatt

        E Offline
        E Offline
        Eduard Keilholz
        wrote on last edited by
        #3

        I am familiar with Expresso, which by the way returns the same result. Expresso also returns only one match containing the tag opening of the first link, untill the closure of the second link. I want two matches with both links ;) Thanks for the help anyway

        .: I love it when a plan comes together :. http://www.zonderpunt.nl

        1 Reply Last reply
        0
        • E Eduard Keilholz

          Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?

          Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
          MatchCollection col = regx.Matches(Input);

          .: I love it when a plan comes together :. http://www.zonderpunt.nl

          modified on Friday, February 19, 2010 6:21 AM

          R Offline
          R Offline
          rhuiden
          wrote on last edited by
          #4

          I think your problem is that . is greedy. You should make it lazy with ?. (Reference[^]) This <a(.)*?href=(\"|')(?<url>(.)*?)(\"|')(.)*?>(?<text>(.)+?)</a> should work. Although it is untested. Lazy lookup is less efficient than regular lookup so you may want to use something more appropriate like (?<url>[^'"]*). I have found http://www.regular-expressions.info/[^] to be a great resource.

          E 1 Reply Last reply
          0
          • R rhuiden

            I think your problem is that . is greedy. You should make it lazy with ?. (Reference[^]) This <a(.)*?href=(\"|')(?<url>(.)*?)(\"|')(.)*?>(?<text>(.)+?)</a> should work. Although it is untested. Lazy lookup is less efficient than regular lookup so you may want to use something more appropriate like (?<url>[^'"]*). I have found http://www.regular-expressions.info/[^] to be a great resource.

            E Offline
            E Offline
            Eduard Keilholz
            wrote on last edited by
            #5

            Wow, works flawless! You for presedent!! Thanks !

            .: I love it when a plan comes together :. http://www.zonderpunt.nl

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups