Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Regular Expression - Achievement unlocked

Regular Expression - Achievement unlocked

Scheduled Pinned Locked Moved The Lounge
regexjsonquestion
50 Posts 26 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Marco Bertschi
    wrote on last edited by
    #1

    I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

    ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

    Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

    Clean-up crew needed, grammar spill... - Nagy Vilmos

    OriginalGriffO C P S J 16 Replies Last reply
    0
    • M Marco Bertschi

      I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

      ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

      Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

      Clean-up crew needed, grammar spill... - Nagy Vilmos

      OriginalGriffO Offline
      OriginalGriffO Offline
      OriginalGriff
      wrote on last edited by
      #2

      Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!

      Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

      "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
      "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

      M Mike HankeyM R S M 5 Replies Last reply
      0
      • OriginalGriffO OriginalGriff

        Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!

        Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

        M Offline
        M Offline
        Marco Bertschi
        wrote on last edited by
        #3

        OriginalGriff wrote:

        Get a copy of this: Expresso[^]

        Just tried it out - And I hand-crafted my RegEx! Well, learning it from scratch has never caused any damage.

        Clean-up crew needed, grammar spill... - Nagy Vilmos

        B 1 Reply Last reply
        0
        • M Marco Bertschi

          OriginalGriff wrote:

          Get a copy of this: Expresso[^]

          Just tried it out - And I hand-crafted my RegEx! Well, learning it from scratch has never caused any damage.

          Clean-up crew needed, grammar spill... - Nagy Vilmos

          B Offline
          B Offline
          BillW33
          wrote on last edited by
          #4

          The site OriginalGiff mentioned is very good. Another very helpful site is regexlib[^]; they have a large library of Regular Expressions. They also have a RegEx tester[^].

          Just because the code works, it doesn't mean that it is good code.

          1 Reply Last reply
          0
          • M Marco Bertschi

            I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

            ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

            Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

            Clean-up crew needed, grammar spill... - Nagy Vilmos

            C Offline
            C Offline
            Chris Losinger
            wrote on last edited by
            #5

            gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.

            image processing toolkits | batch image processing

            M P H 3 Replies Last reply
            0
            • C Chris Losinger

              gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.

              image processing toolkits | batch image processing

              M Offline
              M Offline
              Marco Bertschi
              wrote on last edited by
              #6

              I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|

              Clean-up crew needed, grammar spill... - Nagy Vilmos

              P T K 3 Replies Last reply
              0
              • P PIEBALDconsult

                Marco Bertschi wrote:

                2014-2-5T21:36:14.315Z+1.5

                Almost, but not quite, entirely unlike ISO 8601.

                This space intentionally left blank.

                M Offline
                M Offline
                Marco Bertschi
                wrote on last edited by
                #7

                It's RFC 5424.

                Clean-up crew needed, grammar spill... - Nagy Vilmos

                P 1 Reply Last reply
                0
                • M Marco Bertschi

                  I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                  ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                  Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                  Clean-up crew needed, grammar spill... - Nagy Vilmos

                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #8

                  Marco Bertschi wrote:

                  2014-2-5T21:36:14.315Z+1.5

                  Almost, but not quite, entirely unlike ISO 8601.

                  This space intentionally left blank.

                  M 1 Reply Last reply
                  0
                  • OriginalGriffO OriginalGriff

                    Well done! Get a copy of this: Expresso[^] - it's free, and it explains, tests and helps create regexes. I use it all the time and I wish I'd written it!

                    Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

                    Mike HankeyM Offline
                    Mike HankeyM Offline
                    Mike Hankey
                    wrote on last edited by
                    #9

                    I second expresso it's an awesome tool!

                    My site: Everything Embedded Relax...We're all crazy it's not a competition!

                    1 Reply Last reply
                    0
                    • M Marco Bertschi

                      I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|

                      Clean-up crew needed, grammar spill... - Nagy Vilmos

                      P Offline
                      P Offline
                      PIEBALDconsult
                      wrote on last edited by
                      #10

                      Marco Bertschi wrote:

                      it validates a Syslog-Timestamp

                      If it was written to some sort of log file by some application, why would you doubt it? Edit: Now that I have perused the timestamp part of the RFC, I can state, "those are not valid timestamps".

                      This space intentionally left blank.

                      1 Reply Last reply
                      0
                      • M Marco Bertschi

                        I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                        ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                        Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                        Clean-up crew needed, grammar spill... - Nagy Vilmos

                        S Offline
                        S Offline
                        SoMad
                        wrote on last edited by
                        #11

                        :thumbsup: Good one. Now that you are all warmed up and stretched out, go and take a swing at POSIX time zone format[^] :-\

                        Quote:

                        The following examples represent some of the customized POSIX formats:

                        HAST10HADT,M4.2.0/03:0:0,M10.2.0/03:0:00
                        AST9ADT,M3.2.0,M11.1.0
                        AST9ADT,M3.2.0/03:0:0,M11.1.0/03:0:0
                        EST5EDT,M3.2.0/02:00:00,M11.1.0/02:00:00
                        GRNLNDST3GRNLNDDT,M10.3.0/00:00:00,M2.4.0/00:00:00
                        EST5EDT,M3.2.0/02:00:00,M11.1.0
                        EST5EDT,M3.2.0,M11.1.0/02:00:00
                        CST6CDT,M3.2.0/2:00:00,M11.1.0/2:00:00
                        MST7MDT,M3.2.0/2:00:00,M11.1.0/2:00:00
                        PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00

                        Soren Madsen

                        "When you don't know what you're doing it's best to do it quickly" - Jase #DuckDynasty

                        M 1 Reply Last reply
                        0
                        • M Marco Bertschi

                          It's RFC 5424.

                          Clean-up crew needed, grammar spill... - Nagy Vilmos

                          P Offline
                          P Offline
                          PIEBALDconsult
                          wrote on last edited by
                          #12

                          Reading the RFC leads me to think that it is supposed to be ISO 8601-compliant, but the values you show are not: 0) Missing leading zeroes on single-digit values 1) Time zone should be Z or offset; not both 2) The offset should not have a decimal -- it's hours and minutes

                          This space intentionally left blank.

                          M 1 Reply Last reply
                          0
                          • M Marco Bertschi

                            I had a bit help from an experienced colleague :laugh: The challenge is that it validates a Syslog-Timestamp. Problem? Yes. Here are some examples of valid timestamps: 2014-2-5T21:36:14.315Z-1.5 2014-2-5T21:36:14Z-1 2004-2-28T21:36:14.315Z+1.75 2004-2-29T21:36:14.315315Z+0 You see the problem X|

                            Clean-up crew needed, grammar spill... - Nagy Vilmos

                            T Offline
                            T Offline
                            TnTinMn
                            wrote on last edited by
                            #13

                            I am probably misinterpreting the RFC 5424 spec that you mentioned below and/or a case of complete RegEx ignorance, but from this site[^] it states that a timestamp can be a NILVALUE (where: NILVALUE = "-"). Do you account for this in your code?

                            M 1 Reply Last reply
                            0
                            • C Chris Losinger

                              gack! :wtf: i suppose it's a right of passage everyone has to go through. but whew :) me, i avoid regex like the plague. finite state machines are so much easier to understand and maintain.

                              image processing toolkits | batch image processing

                              P Offline
                              P Offline
                              Paulo Zemek
                              wrote on last edited by
                              #14

                              I can't agree more with your last statement!

                              1 Reply Last reply
                              0
                              • M Marco Bertschi

                                I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                                ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                                Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                                Clean-up crew needed, grammar spill... - Nagy Vilmos

                                J Offline
                                J Offline
                                JimmyRopes
                                wrote on last edited by
                                #15

                                Welcome to the dark side. :suss:

                                The report of my death was an exaggeration - Mark Twain
                                Simply Elegant Designs JimmyRopes Designs
                                Think inside the box! ProActive Secure Systems
                                I'm on-line therefore I am. JimmyRopes

                                1 Reply Last reply
                                0
                                • M Marco Bertschi

                                  I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                                  ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                                  Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                                  Clean-up crew needed, grammar spill... - Nagy Vilmos

                                  B Offline
                                  B Offline
                                  BillWoodruff
                                  wrote on last edited by
                                  #16

                                  Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?

                                  “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                                  J M 2 Replies Last reply
                                  0
                                  • M Marco Bertschi

                                    I had the joy to take my first steps with Regular Expressions today (I have tried to avoid them, but I eventually got that they are absolutely inavoidable, especially when it comes to parsing complex strings). So I present you my first RegEx:

                                    ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                                    Beautiful, isn't she? :-O :-O :-O And that's the string it will parse: 2014-2-5T21:36:14.315Z+1.5

                                    Clean-up crew needed, grammar spill... - Nagy Vilmos

                                    S Offline
                                    S Offline
                                    Steve Wellens
                                    wrote on last edited by
                                    #17

                                    ([0-9]{4})-([1-9]|[1][0-2])-([0-2]?[0-9]|[3][0-1])[T]([0-1]?[0-9]|[2][0-3])[:]([0-5]?[0-9])[:]([0-5]?[0-9])?.?([0-9]{1,6})[Z]([+-][0-9][\.|,]?[0-9]?|[0-9]{2}?|[+-]?[0][1][0-2][\.]?[0-9]?|[0-9]{2}?)

                                    X| X| X|

                                    Steve Wellens

                                    1 Reply Last reply
                                    0
                                    • B BillWoodruff

                                      Congratulations, Marco; I believe learning, and mastering, something new is one of the very best things in life ! It would be interesting to debate (perhaps on the C# forum ?) the short- and long- range cost/benefits of implementing a complex mini-parser like this using RegEx vs. "brute-force" string parsing, where "cost/benefits" would be looked at from different perspectives: say, from the perspective of a manager of programmers vs. a front-line programmer's perspective. Of course the question of "constraints" immediately arises: what makes parsing the range of inputs you show more difficult is: 1. possible ambiguity of the "-" glyph: it is a separator for the Date component, and a sign-indicator for the time-offset. 2. possible ambiguity of the "." glyph: it is a presence/absence indicator for milliseconds, and a decimal-point for the time-offset. If, you, the creator, have control over all possible inputs, and can ensure there will always something like ".0Z" indicating no milliseconds, and there will always be some other character than "-" separating year, month, day, then, obviously parsing becomes so much more simple. However, maybe "control" is an academic issue here because the standard you are coding to allows such latitude in input format; I don't know anything about the RFC you are using. As an experiment, I timed how long it took me to create a non-RegEx solution to parsing your sample data: about thirty minutes (code on request). Since this was done early AM my time (GMT +07), and I was not fully caffeinated, perhaps I could have done this in twenty minutes, or less, later in the day, or evening. Anyone up for debate ?

                                      “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                                      J Offline
                                      J Offline
                                      JimmyRopes
                                      wrote on last edited by
                                      #18

                                      BillWoodruff wrote:

                                      (code on request)

                                      Codes Plz :-D so I can do my homework assignment.

                                      The report of my death was an exaggeration - Mark Twain
                                      Simply Elegant Designs JimmyRopes Designs
                                      Think inside the box! ProActive Secure Systems
                                      I'm on-line therefore I am. JimmyRopes

                                      B 1 Reply Last reply
                                      0
                                      • J JimmyRopes

                                        BillWoodruff wrote:

                                        (code on request)

                                        Codes Plz :-D so I can do my homework assignment.

                                        The report of my death was an exaggeration - Mark Twain
                                        Simply Elegant Designs JimmyRopes Designs
                                        Think inside the box! ProActive Secure Systems
                                        I'm on-line therefore I am. JimmyRopes

                                        B Offline
                                        B Offline
                                        BillWoodruff
                                        wrote on last edited by
                                        #19

                                        JimmyRopes wrote:

                                        Codes Plz

                                        To hear is to obey, Master: [^] * * plain-vanilla text file

                                        “But I don't want to go among mad people,” Alice remarked. “Oh, you can't help that,” said the Cat: “we're all mad here. I'm mad. You're mad.” “How do you know I'm mad?” said Alice. “You must be," said the Cat, or you wouldn't have come here.” Lewis Carroll

                                        J 1 Reply Last reply
                                        0
                                        • T TnTinMn

                                          I am probably misinterpreting the RFC 5424 spec that you mentioned below and/or a case of complete RegEx ignorance, but from this site[^] it states that a timestamp can be a NILVALUE (where: NILVALUE = "-"). Do you account for this in your code?

                                          M Offline
                                          M Offline
                                          Marco Bertschi
                                          wrote on last edited by
                                          #20

                                          You are right - The timestamp can be a NILVALUE! The RegEx is used within the class SyslogTimestamp (see the discussion [^] why I don't use System.DateTime). There is another class, called SyslogMessageHeader, which has a field of type SyslogTimestamp. I plan to handle a NILVALUE as null, and also treat null objects as if they represent a NILVALUE.

                                          Clean-up crew needed, grammar spill... - Nagy Vilmos

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups