Regular Expression for a repeating pattern?

Les Stockton

I've got some data input by a user. it's like <span>    </span> and this can be any number of this non-breaking space up until the close of the span tag, with nothing else. I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between. I was hoping there'd be a way to detect this repeating pattern within a regular expression.

PIEBALDconsult

You might need to test for the Unicode value.

RedDk

Scarf this down: Regex Quantifier Tutorial: Greedy, Lazy, Possessive[^]

jschell

Specifics of where/which regex is used matters. But in general {code} \s*( \s*)+ {code}

Les Stockton wrote:

until the close of the span tag

In valid XML looking for the closing tag is pointless. But you can add it if you want.

Les Stockton wrote:

XML

Just noting that regexes to parse XML is not a good idea. Primarily this comes down to blocks embedded in other blocks. You cannot parse that with a regex. But there are other complex issues also that would require hideous regexes (which means slow) also. Also there can be other variances in what you posted. 1. Multiline 2. Spaces in the tags 3. Attributes in the tag.

k5054

In general, trying to parse XML (or HTML) with regex is not a good idea, and almost certainly doomed to failure. However, to match this specific case you might try:

( *)+"

That's an extended POSIX regex, and seems to do the job. It matches any of the following:

If you need to accept any white space you might try using ( [[:space:]]*) as the sub-pattern. If you may have line breaks in the span text, then you may need to tell your regex engine to not treat them as end-of-text markers.

Keep Calm and Carry On