Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Regular Expression - Achievement unlocked

Regular Expression - Achievement unlocked

Scheduled Pinned Locked Moved The Lounge
regexjsonquestion
50 Posts 26 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Marco Bertschi

    I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|

    Clean-up crew needed, grammar spill... - Nagy Vilmos

    P Offline
    P Offline
    PIEBALDconsult
    wrote on last edited by
    #10

    Marco Bertschi wrote:

    it validates a Syslog-Timestamp

    If it was written to some sort of log file by some application, why would you doubt it? Edit: Now that I have perused the timestamp part of the RFC, I can state, "those are not valid timestamps".

    This space intentionally left blank.

    1 Reply Last reply
    0
    • M Marco Bertschi

      I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

      ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

      Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

      Clean-up crew needed, grammar spill... - Nagy Vilmos

      S Offline
      S Offline
      SoMad
      wrote on last edited by
      #11

      :thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\

      Quote:

      The following examples represent some of the customized POSIX formats:

      HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
      AST9ADT,M3.2.0,M11.1.0
      AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
      EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
      GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
      EST5EDT,M3.2.0/02:00:00,M11.1.0
      EST5EDT,M3.2.0,M11.1.0/02:00:00
      CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
      MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
      PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00

      Soren Madsen

      "When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty

      M 1 Reply Last reply
      0
      • M Marco Bertschi

        It's RFC 5424.

        Clean-up crew needed, grammar spill... - Nagy Vilmos

        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #12

        Reading the RFC leads me to think that it is supposed to be ISO 8601-compliant, but the values you show are not: 0) Missing leading zeroes on single-digit values 1) Time zone should be Z or offset; not both 2) The offset should not have a decimal -- it's hours and minutes

        This space intentionally left blank.

        M 1 Reply Last reply
        0
        • M Marco Bertschi

          I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|

          Clean-up crew needed, grammar spill... - Nagy Vilmos

          T Offline
          T Offline
          TnTinMn
          wrote on last edited by
          #13

          I am probably misinterpreting the RFC 5424 spec that you mentioned below and/or a case of complete RegEx ignorance, but from this site[^] it states that a timestamp can be a NILVALUE (where: NILVALUE = "-"). Do you account for this in your code?

          M 1 Reply Last reply
          0
          • C Chris Losinger

            gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.

            image processing toolkits | batch image processing

            P Offline
            P Offline
            Paulo Zemek
            wrote on last edited by
            #14

            I can't agree more with your last statement!

            1 Reply Last reply
            0
            • M Marco Bertschi

              I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

              ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

              Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

              Clean-up crew needed, grammar spill... - Nagy Vilmos

              J Offline
              J Offline
              JimmyRopes
              wrote on last edited by
              #15

              Welcome to the dark side. :suss:

              The report of my death was an exaggeration - Mark Twain
              Simply Elegant Designs JimmyRopes Designs
              Think inside the box! ProActive Secure Systems
              I'm on-line therefore I am. JimmyRopes

              1 Reply Last reply
              0
              • M Marco Bertschi

                I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                Clean-up crew needed, grammar spill... - Nagy Vilmos

                B Offline
                B Offline
                BillWoodruff
                wrote on last edited by
                #16

                Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?

                “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                J M 2 Replies Last reply
                0
                • M Marco Bertschi

                  I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                  ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                  Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                  Clean-up crew needed, grammar spill... - Nagy Vilmos

                  S Offline
                  S Offline
                  Steve Wellens
                  wrote on last edited by
                  #17

                  ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                  X| X| X|

                  Steve Wellens

                  1 Reply Last reply
                  0
                  • B BillWoodruff

                    Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?

                    “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                    J Offline
                    J Offline
                    JimmyRopes
                    wrote on last edited by
                    #18

                    BillWoodruff wrote:

                    (code on request)

                    Codes Plz :-D so I can do my homework assignment.

                    The report of my death was an exaggeration - Mark Twain
                    Simply Elegant Designs JimmyRopes Designs
                    Think inside the box! ProActive Secure Systems
                    I'm on-line therefore I am. JimmyRopes

                    B 1 Reply Last reply
                    0
                    • J JimmyRopes

                      BillWoodruff wrote:

                      (code on request)

                      Codes Plz :-D so I can do my homework assignment.

                      The report of my death was an exaggeration - Mark Twain
                      Simply Elegant Designs JimmyRopes Designs
                      Think inside the box! ProActive Secure Systems
                      I'm on-line therefore I am. JimmyRopes

                      B Offline
                      B Offline
                      BillWoodruff
                      wrote on last edited by
                      #19

                      JimmyRopes wrote:

                      Codes Plz

                      To hear is to obey, Master: [^] * * plain-vanilla text file

                      “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                      J 1 Reply Last reply
                      0
                      • T TnTinMn

                        I am probably misinterpreting the RFC 5424 spec that you mentioned below and/or a case of complete RegEx ignorance, but from this site[^] it states that a timestamp can be a NILVALUE (where: NILVALUE = "-"). Do you account for this in your code?

                        M Offline
                        M Offline
                        Marco Bertschi
                        wrote on last edited by
                        #20

                        You are right - The timestamp can be a NILVALUE! The RegEx is used within the class SyslogTimestamp (see the discussion [^] why I don't use System.DateTime). There is another class, called SyslogMessageHeader, which has a field of type SyslogTimestamp. I plan to handle a NILVALUE as null, and also treat null objects as if they represent a NILVALUE.

                        Clean-up crew needed, grammar spill... - Nagy Vilmos

                        1 Reply Last reply
                        0
                        • B BillWoodruff

                          Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?

                          “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                          M Offline
                          M Offline
                          Marco Bertschi
                          wrote on last edited by
                          #21

                          BillWoodruff wrote:

                          I believe learning, and mastering, something new is one of the very best things in life !

                          Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!

                          BillWoodruff wrote:

                          Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
                           
                          1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
                           
                          2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.

                          It's all specified like that. [-->]

                          BillWoodruff wrote:

                          If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.

                          I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method public bool FromString(string dateTime) in the code sample below: I wrote the following code so far:

                          using System;
                          using System.Collections.Generic;
                          using System.Globalization;
                          using System.Linq;
                          using System.Text;
                          using System.Text.RegularExpressions;

                          namespace Springlog.Com.Messaging
                          {
                          /// <summary>
                          /// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
                          /// Author: Marco Bertschi, (C) 2014 Marco Bertschi
                          /// </summary>
                          public class SyslogTimestamp
                          {
                          #region Properties
                          /// <summary>
                          /// Returns the count of the days for a specific month in a specific year.
                          /// </summary>
                          /// <param name="month">month </param>
                          /// <param name="year">year</param&a

                          B 1 Reply Last reply
                          0
                          • S SoMad

                            :thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\

                            Quote:

                            The following examples represent some of the customized POSIX formats:

                            HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
                            AST9ADT,M3.2.0,M11.1.0
                            AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
                            EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
                            GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
                            EST5EDT,M3.2.0/02:00:00,M11.1.0
                            EST5EDT,M3.2.0,M11.1.0/02:00:00
                            CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
                            MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
                            PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00

                            Soren Madsen

                            "When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty

                            M Offline
                            M Offline
                            Marco Bertschi
                            wrote on last edited by
                            #22

                            Where is the :Exorcism: Emoticon? :wtf:

                            Clean-up crew needed, grammar spill... - Nagy Vilmos

                            1 Reply Last reply
                            0
                            • P PIEBALDconsult

                              Reading the RFC leads me to think that it is supposed to be ISO 8601-compliant, but the values you show are not: 0) Missing leading zeroes on single-digit values 1) Time zone should be Z or offset; not both 2) The offset should not have a decimal -- it's hours and minutes

                              This space intentionally left blank.

                              M Offline
                              M Offline
                              Marco Bertschi
                              wrote on last edited by
                              #23

                              PIEBALDconsult wrote:

                              1. Missing leading zeroes on single-digit values

                              I know - I decided to allow missing leading zeros in my parsing application. Whatsoever, the returned value from the ToString method will add these leading zeros.

                              PIEBALDconsult wrote:

                              1. Time zone should be Z or offset; not both

                              :-O

                              PIEBALDconsult wrote:

                              1. The offset should not have a decimal -- it's hours and minutes

                              And here I can't quite follow you anymore. Do you mind explaining it?

                              Clean-up crew needed, grammar spill... - Nagy Vilmos

                              J 1 Reply Last reply
                              0
                              • C Chris Losinger

                                gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.

                                image processing toolkits | batch image processing

                                H Offline
                                H Offline
                                Herbie Mountjoy
                                wrote on last edited by
                                #24

                                You gotta love the regex. It is amazing.

                                I may not last forever but the mess I leave behind certainly will.

                                1 Reply Last reply
                                0
                                • OriginalGriffO OriginalGriff

                                  Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!

                                  Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

                                  R Offline
                                  R Offline
                                  Rajesh R Subramanian
                                  wrote on last edited by
                                  #25

                                  :)

                                  "Real men drive manual transmission" - Rajesh.

                                  OriginalGriffO 1 Reply Last reply
                                  0
                                  • M Marco Bertschi

                                    BillWoodruff wrote:

                                    I believe learning, and mastering, something new is one of the very best things in life !

                                    Learning is often strapped to pain, and effort. Mastering the learnt stuff is joy!

                                    BillWoodruff wrote:

                                    Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is:
                                     
                                    1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset.
                                     
                                    2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset.

                                    It's all specified like that. [-->]

                                    BillWoodruff wrote:

                                    If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple.

                                    I don't have any control over the formatting, I only know the constraints of the allowed values. By the way, I wrote some unit tests to compare the performance between split and RegEx: A split does not providea measeruable performance improvement, compared to the Regular Expression. What I really like about the RegEx solutions that the parsing method becomes a lot shorter, and is therefore more readable (see the method public bool FromString(string dateTime) in the code sample below: I wrote the following code so far:

                                    using System;
                                    using System.Collections.Generic;
                                    using System.Globalization;
                                    using System.Linq;
                                    using System.Text;
                                    using System.Text.RegularExpressions;

                                    namespace Springlog.Com.Messaging
                                    {
                                    /// <summary>
                                    /// Represents the Timestamp of a <see cref="SyslogMessageHeader"/>
                                    /// Author: Marco Bertschi, (C) 2014 Marco Bertschi
                                    /// </summary>
                                    public class SyslogTimestamp
                                    {
                                    #region Properties
                                    /// <summary>
                                    /// Returns the count of the days for a specific month in a specific year.
                                    /// </summary>
                                    /// <param name="month">month </param>
                                    /// <param name="year">year</param&a

                                    B Offline
                                    B Offline
                                    BillWoodruff
                                    wrote on last edited by
                                    #26

                                    Hi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill

                                    “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                                    M 1 Reply Last reply
                                    0
                                    • B BillWoodruff

                                      Hi Marco, I'm enjoying reading your novel-in-code :), and I looked at the RFC5424 spec which I consider brain-damaged. I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered. What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ? You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere. cheers, Bill

                                      “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                                      M Offline
                                      M Offline
                                      Marco Bertschi
                                      wrote on last edited by
                                      #27

                                      BillWoodruff wrote:

                                      and I looked at the RFC5424 spec which I consider brain-damaged.

                                      Not at all - The timestamp may be completly brain-damaged, but at least it is clearly specified.

                                      BillWoodruff wrote:

                                      I note that use of "Z" is optional, so that would mean my little flirtation with parsing your data would have to be rejiggered.

                                      That's in fact brain-damaged.

                                      BillWoodruff wrote:

                                      What were the authors of that spec thinking when they allowed ambiguous characters that have varying meanings depending on position ?

                                      As far as I can recall, the spec was written by Rainer Gerhards as the only author. Which explains the complexity, that guy makes a fortune doing consulting for it (at least I suspect it).

                                      BillWoodruff wrote:

                                      You'd also think that specs like this would provide a reference parser implementation in pseudo-code, or some computer language; however, for all I know, there is a reference implementation somewhere.

                                      Not if the author can make consultant money from it.

                                      Clean-up crew needed, grammar spill... - Nagy Vilmos

                                      1 Reply Last reply
                                      0
                                      • OriginalGriffO OriginalGriff

                                        Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!

                                        Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

                                        S Offline
                                        S Offline
                                        Septimus Hedgehog
                                        wrote on last edited by
                                        #28

                                        :thumbsup: I endorse that. Expresso is a very good regex utility.

                                        If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.

                                        1 Reply Last reply
                                        0
                                        • M Marco Bertschi

                                          I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                                          ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                                          Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                                          Clean-up crew needed, grammar spill... - Nagy Vilmos

                                          S Offline
                                          S Offline
                                          Septimus Hedgehog
                                          wrote on last edited by
                                          #29

                                          Marco, OriginalGriff and I recommend Expresso. You might want to add RegexBulder[^] as well to your regex toolbox. RegexBuilder lets you try your expression on multiple input strings which makes it very useful for testing your expression.

                                          If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups