How to express this in a Regex?
-
Hi all, I hope someone of the more experienced members watches this forum - it doesn't seem to be very frequented, but I urgently need a solution very soon. Let me also mention that I am a fairly unexperienced leisuretime programmer, and that scanning the internet and playing around with Expresso for days didn't help - so I set my hopes upon you now... Sniff | :^) . In my application I have to parse different file formats for regular expressions. The match patterns are divided by diffferent patterns of control characters, depending on the file format. A sample would be:
1
01:00:01:10
01:00:05:22
After the conquest and plundering
of the Inca empire by Spain2
01:00:05:25
01:00:09:09
the Indians invented the
legend of El Dorado3
01:00:09:12
01:00:13:24
a land of gold, located in the
swamps of the Amazon headwaters.I get fairly proper matches with this regex:
(?\d+)\r\n(?\d{2}:\d{2}:\d{2}:\d{2})\r\n(\d{2}:\d{2}:\d{2}:\d{2})\r\n(?[a-zA-Z;1-9;\s;\p{P};\p{L}\p{M}]*)\r\n\r\n
The problem is: If there's a '0' (zero) in the text - like e.g. in years ("this happened a.d. 1570") - the whole pattern won't match!? :confused: On the other hand, if I change "1-9" into the common "0-9" pattern, then I have only one matching result, which contains all the other supposed matches in its "Text" group. I hope I expressed the problem in an understandable way... and I'll highly appreciate if someone could guide me to a better solution! Thanks in advance Mick
-
Hi all, I hope someone of the more experienced members watches this forum - it doesn't seem to be very frequented, but I urgently need a solution very soon. Let me also mention that I am a fairly unexperienced leisuretime programmer, and that scanning the internet and playing around with Expresso for days didn't help - so I set my hopes upon you now... Sniff | :^) . In my application I have to parse different file formats for regular expressions. The match patterns are divided by diffferent patterns of control characters, depending on the file format. A sample would be:
1
01:00:01:10
01:00:05:22
After the conquest and plundering
of the Inca empire by Spain2
01:00:05:25
01:00:09:09
the Indians invented the
legend of El Dorado3
01:00:09:12
01:00:13:24
a land of gold, located in the
swamps of the Amazon headwaters.I get fairly proper matches with this regex:
(?\d+)\r\n(?\d{2}:\d{2}:\d{2}:\d{2})\r\n(\d{2}:\d{2}:\d{2}:\d{2})\r\n(?[a-zA-Z;1-9;\s;\p{P};\p{L}\p{M}]*)\r\n\r\n
The problem is: If there's a '0' (zero) in the text - like e.g. in years ("this happened a.d. 1570") - the whole pattern won't match!? :confused: On the other hand, if I change "1-9" into the common "0-9" pattern, then I have only one matching result, which contains all the other supposed matches in its "Text" group. I hope I expressed the problem in an understandable way... and I'll highly appreciate if someone could guide me to a better solution! Thanks in advance Mick
I think the Text portion could be:
(?<Text>((?!\d+\r\n).*\r\n)*)
(?!\d+\r\n)
will exclude any line with number only.*
will include any line (except the lines with number only, as excluded by(?!\d+\r\n)
) Btw, I noticed you don't have?
before<TCEnd>
:) -
I think the Text portion could be:
(?<Text>((?!\d+\r\n).*\r\n)*)
(?!\d+\r\n)
will exclude any line with number only.*
will include any line (except the lines with number only, as excluded by(?!\d+\r\n)
) Btw, I noticed you don't have?
before<TCEnd>
:)Wow - this seems to make it and even looks much simpler than what I've been fiddeling out in so many hours :~! Thank you for ending my sleepless nights :thumbsup::rose: BTW the ? must have got lost in the copy/paste process. It's part of the original. Have a very nice weekend, thank you very much!