Regular Expression - Achievement unlocked
-
You are right - The timestamp can be a NILVALUE! The RegEx is used within the class
SyslogTimestamp
(see the discussion [^] why I don't use System.DateTime). There is another class, calledSyslogMessageHeader
, which has a field of typeSyslogTimestamp
. I plan to handle a NILVALUE as null, and also treat null objects as if they represent a NILVALUE.Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
BillWoodruff wrote:
I believe learning, and mastering, something new is one of the very best things in life !
Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!
BillWoodruff wrote:
Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.It's all specified like that. [-->]
BillWoodruff wrote:
If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.
I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method
public bool FromString(string dateTime)
in the code sample below: I wrote the following code so far:using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;namespace Springlog.Com.Messaging
{
/// <summary>
/// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
/// Author: Marco Bertschi, (C) 2014 Marco Bertschi
/// </summary>
public class SyslogTimestamp
{
#region Properties
/// <summary>
/// Returns the count of the days for a specific month in a specific year.
/// </summary>
/// <param name="month">month </param>
/// <param name="year">year</param&a -
:thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\
Quote:
The following examples represent some of the customized POSIX formats:
HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
AST9ADT,M3.2.0,M11.1.0
AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
EST5EDT,M3.2.0/02:00:00,M11.1.0
EST5EDT,M3.2.0,M11.1.0/02:00:00
CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00Soren Madsen
"When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty
Where is the :Exorcism: Emoticon? :wtf:
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Reading the RFC leads me to think that it is supposed to be ISO 8601-compliant, but the values you show are not: 0) Missing leading zeroes on single-digit values 1) Time zone should be Z or offset; not both 2) The offset should not have a decimal -- it's hours and minutes
This space intentionally left blank.
PIEBALDconsult wrote:
- Missing leading zeroes on single-digit values
I know - I decided to allow missing leading zeros in my parsing application. Whatsoever, the returned value from the
ToString
method will add these leading zeros.PIEBALDconsult wrote:
- Time zone should be Z or offset; not both
:-O
PIEBALDconsult wrote:
- The offset should not have a decimal -- it's hours and minutes
And here I can't quite follow you anymore. Do you mind explaining it?
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.
You gotta love the regex. It is amazing.
I may not last forever but the mess I leave behind certainly will.
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
:)
"Real men drive manual transmission" - Rajesh.
-
BillWoodruff wrote:
I believe learning, and mastering, something new is one of the very best things in life !
Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!
BillWoodruff wrote:
Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.It's all specified like that. [-->]
BillWoodruff wrote:
If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.
I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method
public bool FromString(string dateTime)
in the code sample below: I wrote the following code so far:using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;namespace Springlog.Com.Messaging
{
/// <summary>
/// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
/// Author: Marco Bertschi, (C) 2014 Marco Bertschi
/// </summary>
public class SyslogTimestamp
{
#region Properties
/// <summary>
/// Returns the count of the days for a specific month in a specific year.
/// </summary>
/// <param name="month">month </param>
/// <param name="year">year</param&aHi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
-
Hi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
BillWoodruff wrote:
and I looked at the RFC5424 spec which I consider brain-damaged.
Not at all - The timestamp may be completly brain-damaged, but at least it is clearly specified.
BillWoodruff wrote:
I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered.
That's in fact brain-damaged.
BillWoodruff wrote:
What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ?
As far as I can recall, the spec was written by Rainer Gerhards as the only author. Which explains the complexity, that guy makes a fortune doing consulting for it (at least I suspect it).
BillWoodruff wrote:
You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere.
Not if the author can make consultant money from it.
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
:thumbsup: I endorse that. Expresso is a very good regex utility.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco, OriginalGriff and I recommend Expresso. You might want to add RegexBulder[^] as well to your regex toolbox. RegexBuilder lets you try your expression on multiple input strings which makes it very useful for testing your expression.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
-
PIEBALDconsult wrote:
- Missing leading zeroes on single-digit values
I know - I decided to allow missing leading zeros in my parsing application. Whatsoever, the returned value from the
ToString
method will add these leading zeros.PIEBALDconsult wrote:
- Time zone should be Z or offset; not both
:-O
PIEBALDconsult wrote:
- The offset should not have a decimal -- it's hours and minutes
And here I can't quite follow you anymore. Do you mind explaining it?
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
And here I can't quite follow you anymore. Do you mind explaining it?
See Wikipedia[^]. The 'T' date time delimiter may be omitted according to the standard and is often replaced by a space for better human readability (which does not conform to the standard). So you may replace [T] by [T ]?.
-
Marco Bertschi wrote:
And here I can't quite follow you anymore. Do you mind explaining it?
See Wikipedia[^]. The 'T' date time delimiter may be omitted according to the standard and is often replaced by a space for better human readability (which does not conform to the standard). So you may replace [T] by [T ]?.
Thank you :thumbsup:
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
:)
"Real men drive manual transmission" - Rajesh.
You're welcome! Good little tool, isn't it? I was going to write one myself, but when I found this...
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
(I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings) be the coolest boy in the sandbox.
:rolleyes:
~RaGE();
I think words like 'destiny' are a way of trying to find order where none exists. - Christian Graus Do not feed the troll ! - Common proverb
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
WTF??? I hate Trend. This is what I get when trying to download Expresso. Trend Micro OfficeScan Event URL Blocked The URL that you are attempting to access is a potential security risk. Trend Micro OfficeScan has blocked this URL in keeping with the network security policy. URL: http://www.ultrapico.com/ExpressoSetup3.msi Risk Level: Dangerous Details: Verified fraud page or threat source :(( :~
-
WTF??? I hate Trend. This is what I get when trying to download Expresso. Trend Micro OfficeScan Event URL Blocked The URL that you are attempting to access is a potential security risk. Trend Micro OfficeScan has blocked this URL in keeping with the network security policy. URL: http://www.ultrapico.com/ExpressoSetup3.msi Risk Level: Dangerous Details: Verified fraud page or threat source :(( :~
Strange...I downloaded it here and ran a scan on the MSI and it showed nothing. So I uploaded it to Dropbox: https://www.dropbox.com/s/3fwe3wtop741hpf/ExpressoSetup3.msi[^] - does that help?
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
Strange...I downloaded it here and ran a scan on the MSI and it showed nothing. So I uploaded it to Dropbox: https://www.dropbox.com/s/3fwe3wtop741hpf/ExpressoSetup3.msi[^] - does that help?
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
Ah thanks. I'll grab it from home since DB is also blocked at work.:thumbsup:
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
You didn't need to tell us what it will parse, it's immediatelly obvious what the regex is for. Anyone can see that.
To alcohol! The cause of, and solution to, all of life's problems - Homer Simpson ---- Our heads are round so our thoughts can change direction - Francis Picabia
-
Ah thanks. I'll grab it from home since DB is also blocked at work.:thumbsup:
They are a friendly bunch, aren't they? :laugh: Do they allow you access to anything? :omg:
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos