Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Regular expression to find href tags.

Regular expression to find href tags.

Scheduled Pinned Locked Moved C#
csharphtmldatabasecomregex
4 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    Pav1977
    wrote on last edited by
    #1

    Hi there, I'm new to C# and .NET and just wandering if there is anything clever that I could download (library, etc.) or maybe is part of the standard C# library that could help this: In an import scenario (of tousands of articles) I have a column in the database that contains links. This is a very messy database (from a CMS) and some of the links are www.somelink.com (the good ones), etc. some are actualy with the html tag www.somelink.com I was wandering if there is a regular expression anywhere that would help me to filter out the Web link? In the worst case scenario I'll have to write it myself - not a huge worry but would prefere to resuse of course. Any help much appreciated. Kind regards, Pav

    M M Mike HankeyM 3 Replies Last reply
    0
    • P Pav1977

      Hi there, I'm new to C# and .NET and just wandering if there is anything clever that I could download (library, etc.) or maybe is part of the standard C# library that could help this: In an import scenario (of tousands of articles) I have a column in the database that contains links. This is a very messy database (from a CMS) and some of the links are www.somelink.com (the good ones), etc. some are actualy with the html tag www.somelink.com I was wandering if there is a regular expression anywhere that would help me to filter out the Web link? In the worst case scenario I'll have to write it myself - not a huge worry but would prefere to resuse of course. Any help much appreciated. Kind regards, Pav

      M Offline
      M Offline
      M Harris
      wrote on last edited by
      #2

      <a.+?href>]*>(?.+?) should about do it :) I just wrote that for you, should get you three named match groups: "HREF", "Domain", "Text". My test data was: asd asd asd asd a_s_d all matched correctly.

      -- Real programmers don't comment their code. It was hard to write, it should be hard to understand.

      1 Reply Last reply
      0
      • P Pav1977

        Hi there, I'm new to C# and .NET and just wandering if there is anything clever that I could download (library, etc.) or maybe is part of the standard C# library that could help this: In an import scenario (of tousands of articles) I have a column in the database that contains links. This is a very messy database (from a CMS) and some of the links are www.somelink.com (the good ones), etc. some are actualy with the html tag www.somelink.com I was wandering if there is a regular expression anywhere that would help me to filter out the Web link? In the worst case scenario I'll have to write it myself - not a huge worry but would prefere to resuse of course. Any help much appreciated. Kind regards, Pav

        M Offline
        M Offline
        Mohammad Dayyan
        wrote on last edited by
        #3

        Try this :

        <(a|A)[^>]*>[^<]*<\/(a|A)>

        Further learning : http://www.regular-expressions.info/[^]

        modified on Friday, September 12, 2008 5:10 PM

        1 Reply Last reply
        0
        • P Pav1977

          Hi there, I'm new to C# and .NET and just wandering if there is anything clever that I could download (library, etc.) or maybe is part of the standard C# library that could help this: In an import scenario (of tousands of articles) I have a column in the database that contains links. This is a very messy database (from a CMS) and some of the links are www.somelink.com (the good ones), etc. some are actualy with the html tag www.somelink.com I was wandering if there is a regular expression anywhere that would help me to filter out the Web link? In the worst case scenario I'll have to write it myself - not a huge worry but would prefere to resuse of course. Any help much appreciated. Kind regards, Pav

          Mike HankeyM Offline
          Mike HankeyM Offline
          Mike Hankey
          wrote on last edited by
          #4

          Try Expresso, its a very handy tool if your going to be doing regular expressions. It also contains a small library of reg. expressions for common problems. Here[^] Mike

          Semper Fi http://www.hq4thmarinescomm.com[^] My Site

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups