Regex Problem With Control Characters
-
Hi all, I hope someone of the more experienced members watches this forum - it doesn't seem to be very frequented, but I urgently need a solution very soon. Let me also mention that I am a fairly unexperienced leisuretime programmer, and that scanning the internet and playing around with Expresso for days didn't help - so I set my hopes upon you now... :^). In my application I have to parse different file formats for regular expressions. The match patterns are divided by diffferent patterns of control characters, depending on the file format. A sample would be:
1
01:00:01:10
01:00:05:22
After the conquest and plundering
of the Inca empire by Spain2
01:00:05:25
01:00:09:09
the Indians invented the
legend of El Dorado3
01:00:09:12
01:00:13:24
a land of gold, located in the
swamps of the Amazon headwaters.I get fairly proper matches with this regex:
(?\d+)\r\n(?\d{2}:\d{2}:\d{2}:\d{2})\r\n(?\d{2}:\d{2}:\d{2}:\d{2})\r\n(?[a-zA-Z;1-9;\s;\p{P};\p{L}\p{M}]*)\r\n\r\n", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant Or RegexOptions.IgnorePatternWhitespace Or RegexOptions.Compiled)
The problem is: If there's a '0' (zero) in the text - like e.g. in years (this happened a.d. '1570') - the whole pattern won't match. On the other hand, if I change '1-9' in to the common '0-9' pattern, then I have only one matching result, which contains all the other supposed matches in its 'Text' group. I hope I expressed the problem in an understandable way... I'd highly appreciate if someone could guide me to a better solution! Thanks in advance Mick