Regular expression [modified]
-
Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?
Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
MatchCollection col = regx.Matches(Input);.: I love it when a plan comes together :. http://www.zonderpunt.nl
modified on Friday, February 19, 2010 6:21 AM
-
Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?
Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
MatchCollection col = regx.Matches(Input);.: I love it when a plan comes together :. http://www.zonderpunt.nl
modified on Friday, February 19, 2010 6:21 AM
-
I am familiar with Expresso, which by the way returns the same result. Expresso also returns only one match containing the tag opening of the first link, untill the closure of the second link. I want two matches with both links ;) Thanks for the help anyway
.: I love it when a plan comes together :. http://www.zonderpunt.nl
-
Hey guys, I'm trying to create a regular expression which matches text with html links in it. I want to extract both the URL and the 'clickable' text. The regular expression so far : (<a(.)*href=("|')(?<url>(.)*)("|')(.)*>(?<text>(.)+)</a>) Now in general, this works. When I enter a text with a html link in it : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression returns one match containing <a href... ... </a> so that works fine... However, when I enter a text that contains two links : blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah blah blah blah <a href="/link.aspx" target="_blank">clickable text</a> blah blah The regular expression still returns one match, containing the first <a href... till the second </a> which is not the wanted result. I want the regular expression to return two matches. Can anyone help me?
Regex regx = new Regex("<a(.)*href=(\"|')(?<url>(.)*)(\"|')(.)*>(?<text>(.)+)</a>", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
MatchCollection col = regx.Matches(Input);.: I love it when a plan comes together :. http://www.zonderpunt.nl
modified on Friday, February 19, 2010 6:21 AM
I think your problem is that
.
is greedy. You should make it lazy with?
. (Reference[^]) This<a(.)*?href=(\"|')(?<url>(.)*?)(\"|')(.)*?>(?<text>(.)+?)</a>
should work. Although it is untested. Lazy lookup is less efficient than regular lookup so you may want to use something more appropriate like(?<url>[^'"]*)
. I have found http://www.regular-expressions.info/[^] to be a great resource. -
I think your problem is that
.
is greedy. You should make it lazy with?
. (Reference[^]) This<a(.)*?href=(\"|')(?<url>(.)*?)(\"|')(.)*?>(?<text>(.)+?)</a>
should work. Although it is untested. Lazy lookup is less efficient than regular lookup so you may want to use something more appropriate like(?<url>[^'"]*)
. I have found http://www.regular-expressions.info/[^] to be a great resource.Wow, works flawless! You for presedent!! Thanks !
.: I love it when a plan comes together :. http://www.zonderpunt.nl