Regular Expression - Achievement unlocked
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
:)
"Real men drive manual transmission" - Rajesh.
-
BillWoodruff wrote:
I believe learning, and mastering, something new is one of the very best things in life !
Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!
BillWoodruff wrote:
Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.It's all specified like that. [-->]
BillWoodruff wrote:
If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.
I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method
public bool FromString(string dateTime)
in the code sample below: I wrote the following code so far:using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;namespace Springlog.Com.Messaging
{
/// <summary>
/// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
/// Author: Marco Bertschi, (C) 2014 Marco Bertschi
/// </summary>
public class SyslogTimestamp
{
#region Properties
/// <summary>
/// Returns the count of the days for a specific month in a specific year.
/// </summary>
/// <param name="month">month </param>
/// <param name="year">year</param&aHi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
-
Hi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
BillWoodruff wrote:
and I looked at the RFC5424 spec which I consider brain-damaged.
Not at all - The timestamp may be completly brain-damaged, but at least it is clearly specified.
BillWoodruff wrote:
I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered.
That's in fact brain-damaged.
BillWoodruff wrote:
What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ?
As far as I can recall, the spec was written by Rainer Gerhards as the only author. Which explains the complexity, that guy makes a fortune doing consulting for it (at least I suspect it).
BillWoodruff wrote:
You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere.
Not if the author can make consultant money from it.
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
:thumbsup: I endorse that. Expresso is a very good regex utility.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco, OriginalGriff and I recommend Expresso. You might want to add RegexBulder[^] as well to your regex toolbox. RegexBuilder lets you try your expression on multiple input strings which makes it very useful for testing your expression.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
-
PIEBALDconsult wrote:
- Missing leading zeroes on single-digit values
I know - I decided to allow missing leading zeros in my parsing application. Whatsoever, the returned value from the
ToString
method will add these leading zeros.PIEBALDconsult wrote:
- Time zone should be Z or offset; not both
:-O
PIEBALDconsult wrote:
- The offset should not have a decimal -- it's hours and minutes
And here I can't quite follow you anymore. Do you mind explaining it?
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
And here I can't quite follow you anymore. Do you mind explaining it?
See Wikipedia[^]. The 'T' date time delimiter may be omitted according to the standard and is often replaced by a space for better human readability (which does not conform to the standard). So you may replace [T] by [T ]?.
-
Marco Bertschi wrote:
And here I can't quite follow you anymore. Do you mind explaining it?
See Wikipedia[^]. The 'T' date time delimiter may be omitted according to the standard and is often replaced by a space for better human readability (which does not conform to the standard). So you may replace [T] by [T ]?.
Thank you :thumbsup:
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
:)
"Real men drive manual transmission" - Rajesh.
You're welcome! Good little tool, isn't it? I was going to write one myself, but when I found this...
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
(I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings) be the coolest boy in the sandbox.
:rolleyes:
~RaGE();
I think words like 'destiny' are a way of trying to find order where none exists. - Christian Graus Do not feed the troll ! - Common proverb
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
WTF??? I hate Trend. This is what I get when trying to download Expresso. Trend Micro OfficeScan Event URL Blocked The URL that you are attempting to access is a potential security risk. Trend Micro OfficeScan has blocked this URL in keeping with the network security policy. URL: http://www.ultrapico.com/ExpressoSetup3.msi Risk Level: Dangerous Details: Verified fraud page or threat source :(( :~
-
WTF??? I hate Trend. This is what I get when trying to download Expresso. Trend Micro OfficeScan Event URL Blocked The URL that you are attempting to access is a potential security risk. Trend Micro OfficeScan has blocked this URL in keeping with the network security policy. URL: http://www.ultrapico.com/ExpressoSetup3.msi Risk Level: Dangerous Details: Verified fraud page or threat source :(( :~
Strange...I downloaded it here and ran a scan on the MSI and it showed nothing. So I uploaded it to Dropbox: https://www.dropbox.com/s/3fwe3wtop741hpf/ExpressoSetup3.msi[^] - does that help?
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
Strange...I downloaded it here and ran a scan on the MSI and it showed nothing. So I uploaded it to Dropbox: https://www.dropbox.com/s/3fwe3wtop741hpf/ExpressoSetup3.msi[^] - does that help?
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
Ah thanks. I'll grab it from home since DB is also blocked at work.:thumbsup:
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
You didn't need to tell us what it will parse, it's immediatelly obvious what the regex is for. Anyone can see that.
To alcohol! The cause of, and solution to, all of life's problems - Homer Simpson ---- Our heads are round so our thoughts can change direction - Francis Picabia
-
Ah thanks. I'll grab it from home since DB is also blocked at work.:thumbsup:
They are a friendly bunch, aren't they? :laugh: Do they allow you access to anything? :omg:
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
They are a friendly bunch, aren't they? :laugh: Do they allow you access to anything? :omg:
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
Yip, at least I can get onto CP, LOL. No social sites, so I hope they don't investigate the full capabilities of CP :^) ;) :doh:
-
JimmyRopes wrote:
Codes Plz
To hear is to obey, Master: [^] * * plain-vanilla text file
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
Ahhhhhhhhh. X| I knew I used RegEx for a reason. I have been using RegEx since my first stint at Bell Laboratories in the late 1970's. If you couldn't "grep" you had no street cred. Since then I have used RegEx on many platforms and in many languages. I don't even think about searching for patterns any other way. I know people who despise RegEx but it is to their detrement. They will have to code and debug a lot of code to do any kind of complex pattern recognition when they could use a RegEx and be done with it. As for the efficiency that will depend on whether the RegEx engine is context free NFA (Nondeterministic Finite Automaton) or context sensitive DFA (Deterministic Finite Automaton). In general context free is slower at complex pattern recognition but much easier to implement. Unless your application is time critical you will not notice any serious delay in processing as a result of using RegEx and they are easier to implement and debug. That is why I use them wherever I need to parse data. Just my opinion.
The report of my death was an exaggeration - Mark Twain
Simply Elegant Designs JimmyRopes Designs
Think inside the box! ProActive Secure Systems
I'm on-line therefore I am. JimmyRopes -
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
You could simplify/shorten it by replacing each instance of [0-9] with \d and getting rid of [] around single characters, though I suppose that could make it less readable.
I already have improved it, with the help of OG: http://www.codeproject.com/Messages/4752996/Re-RegEx-problem-sharp2.aspx[^]
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
I have managed to avoid this convoluted mess for over 30 years. And with a little luck, when i die i will STILL not speak regex. .net string methods work just fine for me. About as fast for 90% of your needs and more readable for 100% of your needs. The 2 or 3 times i have actually needed the power of regex in 30 years, i just sub'd out that line of code. And rather than learning/debugging/banging head .... I went to the beach swilling cheap whiskey and chasing cheaper women. My advice to those who dont yet know regex....impress your peers with cheap whiskey and women and forget about the regex. Both make your head hurt....but one is less fun.....