Regular Expression for a repeating pattern?
-
I've got some data input by a user. it's like
<span> </span>
and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression. -
I've got some data input by a user. it's like
<span> </span>
and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression.You might need to test for the Unicode value.
-
I've got some data input by a user. it's like
<span> </span>
and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression. -
I've got some data input by a user. it's like
<span> </span>
and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression.Specifics of where/which regex is used matters. But in general {code} \s*( \s*)+ {code}
Les Stockton wrote:
until the close of the span tag
In valid XML looking for the closing tag is pointless. But you can add it if you want.
Les Stockton wrote:
XML
Just noting that regexes to parse XML is not a good idea. Primarily this comes down to blocks embedded in other blocks. You cannot parse that with a regex. But there are other complex issues also that would require hideous regexes (which means slow) also. Also there can be other variances in what you posted. 1. Multiline 2. Spaces in the tags 3. Attributes in the tag.
-
I've got some data input by a user. it's like
<span> </span>
and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression.In general, trying to parse XML (or HTML) with regex is not a good idea, and almost certainly doomed to failure. However, to match this specific case you might try:
( *)+"
That's an extended POSIX regex, and seems to do the job. It matches any of the following:
If you need to accept any white space you might try using
( [[:space:]]*)
as the sub-pattern. If you may have line breaks in the span text, then you may need to tell your regex engine to not treat them as end-of-text markers.Keep Calm and Carry On