Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. C# Embedded Regex

C# Embedded Regex

Scheduled Pinned Locked Moved The Lounge
csharpperformancehelplinqcom
11 Posts 5 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J jschell

    Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?

    P Offline
    P Offline
    PIEBALDconsult
    wrote on last edited by
    #2

    That may help in a very few use cases. I may need to try it in one such case. But generally, I doubt it will be beneficial to many users. As you say, I don't think having stack trace or debug information will be very beneficial. I use Regular Expressions a lot -- and I nearly always specify Compiled even when I know it will be used only once, never CompileToAssembly. When I have a Regular Expression which executes once when I run a utility which runs for only a few seconds, then the "startup" cost is unimportant. When I have a Regular Expression which executes millions of times when I run a utility which runs for an extended period (several minutes, an hour, etc.), then the "startup" cost is unimportant. The one place I can think of immediately where maybe this could be of benefit is when I use a Regular Expressions in an SQL Server CLR function -- which is quite a bit actually. If it can speed that up, then I would use it in that one use case. But it would still affect only static Regular Expressions, not dynamic ones.

    1 Reply Last reply
    0
    • J jschell

      Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?

      D Offline
      D Offline
      Dan Neely
      wrote on last edited by
      #3

      I want someone who's playing with the beta compiler to test it on the regex that can validate almost all of the disgusting edge cases that the email spec actually allows not just a basic `user+suffix@domain.tld` matching pattern that most of call good enough and share the explanations output. Consider it a stress test. Can the explain function actually output a full explanation without truncating it, and will CP allow a message that long. :laugh:

      (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:

      J 1 Reply Last reply
      0
      • J jschell

        Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?

        L Offline
        L Offline
        lmoelleb
        wrote on last edited by
        #4

        One advantage: invalid regex syntax is a compile time warning instead of a runtime error. I am not arguing this outweigh the disadvantages.

        P 1 Reply Last reply
        0
        • J jschell

          Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #5

          I think your criticisms are valid, but personally, I love regex and state machines, and feel right at home dealing with that sort of code. I'd love to have it opened up for me the way my own regex code generators work in .NET. But in general, I think you're right. I would consider this feature for my own projects, but might think twice before I'd like to use it in a team development environment for the reasons you mention.

          To err is human. Fortune favors the monsters.

          J 1 Reply Last reply
          0
          • D Dan Neely

            I want someone who's playing with the beta compiler to test it on the regex that can validate almost all of the disgusting edge cases that the email spec actually allows not just a basic `user+suffix@domain.tld` matching pattern that most of call good enough and share the explanations output. Consider it a stress test. Can the explain function actually output a full explanation without truncating it, and will CP allow a message that long. :laugh:

            (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #6

            I have seen at least one and perhaps several ways of defining that regex which makes it easier to understand. It involves breaking it into pieces and then commenting on each one. For actual usage the pieces are then put together.

            1 Reply Last reply
            0
            • H honey the codewitch

              I think your criticisms are valid, but personally, I love regex and state machines, and feel right at home dealing with that sort of code. I'd love to have it opened up for me the way my own regex code generators work in .NET. But in general, I think you're right. I would consider this feature for my own projects, but might think twice before I'd like to use it in a team development environment for the reasons you mention.

              To err is human. Fortune favors the monsters.

              J Offline
              J Offline
              jschell
              wrote on last edited by
              #7

              honey the codewitch wrote:

              I love regex and state machines

              I have been using perl since early 90s and never had a co-worker that was as comfortable with regexes as I am. And definitely seen some that didn't know what they were doing.

              honey the codewitch wrote:

              I'd love to have it opened up for me

              I wrote my own basic regex (characters classes and such) and modified another. So at least for the basics I know how they work. But also know the complexities and I would not want to program those. Pretty sure that the source for .Net that implements it right now is available. And I know that the java source code for that is available. The code needed for both are going to be conceptually the same because both use the same regex. Same would be true of javascript.

              H 1 Reply Last reply
              0
              • J jschell

                honey the codewitch wrote:

                I love regex and state machines

                I have been using perl since early 90s and never had a co-worker that was as comfortable with regexes as I am. And definitely seen some that didn't know what they were doing.

                honey the codewitch wrote:

                I'd love to have it opened up for me

                I wrote my own basic regex (characters classes and such) and modified another. So at least for the basics I know how they work. But also know the complexities and I would not want to program those. Pretty sure that the source for .Net that implements it right now is available. And I know that the java source code for that is available. The code needed for both are going to be conceptually the same because both use the same regex. Same would be true of javascript.

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #8

                Right, but then I'm stuck reimplementing their regex engine if I want this feature. I have no interest in doing that. I've already done it. It's boring.

                To err is human. Fortune favors the monsters.

                L 1 Reply Last reply
                0
                • H honey the codewitch

                  Right, but then I'm stuck reimplementing their regex engine if I want this feature. I have no interest in doing that. I've already done it. It's boring.

                  To err is human. Fortune favors the monsters.

                  L Offline
                  L Offline
                  lmoelleb
                  wrote on last edited by
                  #9

                  Not quite sure what you are after? Nothing stops you from implementing your own regex attributes and source generators - so what is it you are missing? Hooking your own engine into Microsoft's existing attributes?

                  H 1 Reply Last reply
                  0
                  • L lmoelleb

                    Not quite sure what you are after? Nothing stops you from implementing your own regex attributes and source generators - so what is it you are missing? Hooking your own engine into Microsoft's existing attributes?

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #10

                    Nothing. I'm missing nothing, since Microsoft implemented the feature in the OP.

                    To err is human. Fortune favors the monsters.

                    1 Reply Last reply
                    0
                    • L lmoelleb

                      One advantage: invalid regex syntax is a compile time warning instead of a runtime error. I am not arguing this outweigh the disadvantages.

                      P Offline
                      P Offline
                      PIEBALDconsult
                      wrote on last edited by
                      #11

                      On the other hand, by the time the application gets to production, it will have been tested. It's only in development where a compile time error would be beneficial. Frequently, a Regular Expression will have been developed and tested in Expresso or similar, so I wouldn't say that a compiler error is that much of a benefit.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups