Regex Question
-
I can usually muddle my way through creating a regular expression but this one has me stumped (even with Expresso). I need to search through the contents of a text box and find each word in an HTML doc, but avoid the text within the tags. I have a Regex that works fine in a plain text file, but not in a doc with tags. Example: <li>List Item</li> I'd want to find "List" and "Item" but not "li" or "/li" If anyone has a Regex pattern that does this I'd love to see it, with thanks.
-
I can usually muddle my way through creating a regular expression but this one has me stumped (even with Expresso). I need to search through the contents of a text box and find each word in an HTML doc, but avoid the text within the tags. I have a Regex that works fine in a plain text file, but not in a doc with tags. Example: <li>List Item</li> I'd want to find "List" and "Item" but not "li" or "/li" If anyone has a Regex pattern that does this I'd love to see it, with thanks.
Alan, try this. The code originally came from the netSpell library, so i'm not 100% it works correctly...
Private _htmlRegex As Regex = New Regex("</[c-g\d]+>|</[i-o\d]+>|</[a\d]+>|</[q-z\d]+>|<[cg]+[^>]*>|<[i-o]+[^>]*>|<[q-z]+[^>]*>|<[a]+[^>]*>|<(\[^\]*\|'[^']*'|[^'\>])*>", RegexOptions.IgnoreCase Or RegexOptions.Compiled)
"An eye for an eye only ends up making the whole world blind"
-
Alan, try this. The code originally came from the netSpell library, so i'm not 100% it works correctly...
Private _htmlRegex As Regex = New Regex("</[c-g\d]+>|</[i-o\d]+>|</[a\d]+>|</[q-z\d]+>|<[cg]+[^>]*>|<[i-o]+[^>]*>|<[q-z]+[^>]*>|<[a]+[^>]*>|<(\[^\]*\|'[^']*'|[^'\>])*>", RegexOptions.IgnoreCase Or RegexOptions.Compiled)
"An eye for an eye only ends up making the whole world blind"
Just ran it thru Expresso. It actually finds the tags, not the text between them. I have a Regex for stripping html, but I think this one might work better. So I can still use it. Thanks! Hadn't thought about it before but I've got a copy of NetSpell somewhere. I'll poke around inside and see what I can see. AB
-
Just ran it thru Expresso. It actually finds the tags, not the text between them. I have a Regex for stripping html, but I think this one might work better. So I can still use it. Thanks! Hadn't thought about it before but I've got a copy of NetSpell somewhere. I'll poke around inside and see what I can see. AB
Didn't find a Regex I could adapt in the NetSpell files. Will keep hammering on the thing. I keep thinking I'm missing something obvious but don't know yet what it is.