Regular Expression - Achievement unlocked
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.
-
gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.
I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Marco Bertschi wrote:
2014-2-5T21:36:14.315Z+1.5
Almost, but not quite, entirely unlike ISO 8601.
This space intentionally left blank.
It's RFC 5424.
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
2014-2-5T21:36:14.315Z+1.5
Almost, but not quite, entirely unlike ISO 8601.
This space intentionally left blank.
-
Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
I second expresso it's an awesome tool!
My site: Everything Embedded Relax...We're all crazy it's not a competition!
-
I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|
Clean-up crew needed, grammar spill... - Nagy Vilmos
Marco Bertschi wrote:
it validates a Syslog-Timestamp
If it was written to some sort of log file by some application, why would you doubt it? Edit: Now that I have perused the timestamp part of the RFC, I can state, "those are not valid timestamps".
This space intentionally left blank.
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
:thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\
Quote:
The following examples represent some of the customized POSIX formats:
HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
AST9ADT,M3.2.0,M11.1.0
AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
EST5EDT,M3.2.0/02:00:00,M11.1.0
EST5EDT,M3.2.0,M11.1.0/02:00:00
CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00Soren Madsen
"When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty
-
It's RFC 5424.
Clean-up crew needed, grammar spill... - Nagy Vilmos
Reading the RFC leads me to think that it is supposed to be ISO 8601-compliant, but the values you show are not: 0) Missing leading zeroes on single-digit values 1) Time zone should be Z or offset; not both 2) The offset should not have a decimal -- it's hours and minutes
This space intentionally left blank.
-
I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|
Clean-up crew needed, grammar spill... - Nagy Vilmos
-
gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.
I can't agree more with your last statement!
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Welcome to the dark side. :suss:
The report of my death was an exaggeration - Mark Twain
Simply Elegant Designs JimmyRopes Designs
Think inside the box! ProActive Secure Systems
I'm on-line therefore I am. JimmyRopes -
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
-
I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5
Clean-up crew needed, grammar spill... - Nagy Vilmos
([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)
X| X| X|
Steve Wellens
-
Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
BillWoodruff wrote:
(code on request)
Codes Plz :-D so I can do my homework assignment.
The report of my death was an exaggeration - Mark Twain
Simply Elegant Designs JimmyRopes Designs
Think inside the box! ProActive Secure Systems
I'm on-line therefore I am. JimmyRopes -
BillWoodruff wrote:
(code on request)
Codes Plz :-D so I can do my homework assignment.
The report of my death was an exaggeration - Mark Twain
Simply Elegant Designs JimmyRopes Designs
Think inside the box! ProActive Secure Systems
I'm on-line therefore I am. JimmyRopesJimmyRopes wrote:
Codes Plz
To hear is to obey, Master: [^] * * plain-vanilla text file
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
-
You are right - The timestamp can be a NILVALUE! The RegEx is used within the class
SyslogTimestamp
(see the discussion [^] why I don't use System.DateTime). There is another class, calledSyslogMessageHeader
, which has a field of typeSyslogTimestamp
. I plan to handle a NILVALUE as null, and also treat null objects as if they represent a NILVALUE.Clean-up crew needed, grammar spill... - Nagy Vilmos
-
Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?
“But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll
BillWoodruff wrote:
I believe learning, and mastering, something new is one of the very best things in life !
Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!
BillWoodruff wrote:
Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.It's all specified like that. [-->]
BillWoodruff wrote:
If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.
I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method
public bool FromString(string dateTime)
in the code sample below: I wrote the following code so far:using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;namespace Springlog.Com.Messaging
{
/// <summary>
/// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
/// Author: Marco Bertschi, (C) 2014 Marco Bertschi
/// </summary>
public class SyslogTimestamp
{
#region Properties
/// <summary>
/// Returns the count of the days for a specific month in a specific year.
/// </summary>
/// <param name="month">month </param>
/// <param name="year">year</param&a -
:thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\
Quote:
The following examples represent some of the customized POSIX formats:
HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
AST9ADT,M3.2.0,M11.1.0
AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
EST5EDT,M3.2.0/02:00:00,M11.1.0
EST5EDT,M3.2.0,M11.1.0/02:00:00
CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00Soren Madsen
"When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty
Where is the :Exorcism: Emoticon? :wtf:
Clean-up crew needed, grammar spill... - Nagy Vilmos