C# Embedded Regex
-
Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?
That may help in a very few use cases. I may need to try it in one such case. But generally, I doubt it will be beneficial to many users. As you say, I don't think having stack trace or debug information will be very beneficial. I use Regular Expressions a lot -- and I nearly always specify Compiled even when I know it will be used only once, never CompileToAssembly. When I have a Regular Expression which executes once when I run a utility which runs for only a few seconds, then the "startup" cost is unimportant. When I have a Regular Expression which executes millions of times when I run a utility which runs for an extended period (several minutes, an hour, etc.), then the "startup" cost is unimportant. The one place I can think of immediately where maybe this could be of benefit is when I use a Regular Expressions in an SQL Server CLR function -- which is quite a bit actually. If it can speed that up, then I would use it in that one use case. But it would still affect only static Regular Expressions, not dynamic ones.
-
Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?
I want someone who's playing with the beta compiler to test it on the regex that can validate almost all of the disgusting edge cases that the email spec actually allows not just a basic `user+suffix@domain.tld` matching pattern that most of call good enough and share the explanations output. Consider it a stress test. Can the explain function actually output a full explanation without truncating it, and will CP allow a message that long. :laugh:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:
-
Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?
-
Odd new feature for C# is that it can compile Regex right into the code. Steven Giesel[^] "The advantage over the traditional approach is that we can get the same performance as new Regex("...", RegexOptions.Compiled) and the startup benefit of Regex.CompileToAssembly, but without the complexity of CompileToAssembly. As the code is generated it can be viewed and debugged." So excluding perhaps the "startup" part all I can think about is the simpler thing with linq expressions (no idea what they are called). The problem with those is in production systems where one wants to log a stack trace from the exception. And now, with the above feature, one is likely going to see 20 lines of internal code (like linq) which makes no sense to anyone. Also doubt the "viewed and debugged" claim. The article provides a sample, I believe, of what one sees but in my experience most developers who attempt to touch regexes are walking in a mine field with no idea a mine field even exists. So showing them a failed regex isn't going to help much. But I have no doubt that developers will use this because they think it is 'better'. While ignoring the only feature advantage from the above which is about the "startup" speed. Anyone want to provide an alternative take?
I think your criticisms are valid, but personally, I love regex and state machines, and feel right at home dealing with that sort of code. I'd love to have it opened up for me the way my own regex code generators work in .NET. But in general, I think you're right. I would consider this feature for my own projects, but might think twice before I'd like to use it in a team development environment for the reasons you mention.
To err is human. Fortune favors the monsters.
-
I want someone who's playing with the beta compiler to test it on the regex that can validate almost all of the disgusting edge cases that the email spec actually allows not just a basic `user+suffix@domain.tld` matching pattern that most of call good enough and share the explanations output. Consider it a stress test. Can the explain function actually output a full explanation without truncating it, and will CP allow a message that long. :laugh:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:
-
I think your criticisms are valid, but personally, I love regex and state machines, and feel right at home dealing with that sort of code. I'd love to have it opened up for me the way my own regex code generators work in .NET. But in general, I think you're right. I would consider this feature for my own projects, but might think twice before I'd like to use it in a team development environment for the reasons you mention.
To err is human. Fortune favors the monsters.
honey the codewitch wrote:
I love regex and state machines
I have been using perl since early 90s and never had a co-worker that was as comfortable with regexes as I am. And definitely seen some that didn't know what they were doing.
honey the codewitch wrote:
I'd love to have it opened up for me
I wrote my own basic regex (characters classes and such) and modified another. So at least for the basics I know how they work. But also know the complexities and I would not want to program those. Pretty sure that the source for .Net that implements it right now is available. And I know that the java source code for that is available. The code needed for both are going to be conceptually the same because both use the same regex. Same would be true of javascript.
-
honey the codewitch wrote:
I love regex and state machines
I have been using perl since early 90s and never had a co-worker that was as comfortable with regexes as I am. And definitely seen some that didn't know what they were doing.
honey the codewitch wrote:
I'd love to have it opened up for me
I wrote my own basic regex (characters classes and such) and modified another. So at least for the basics I know how they work. But also know the complexities and I would not want to program those. Pretty sure that the source for .Net that implements it right now is available. And I know that the java source code for that is available. The code needed for both are going to be conceptually the same because both use the same regex. Same would be true of javascript.
Right, but then I'm stuck reimplementing their regex engine if I want this feature. I have no interest in doing that. I've already done it. It's boring.
To err is human. Fortune favors the monsters.
-
Right, but then I'm stuck reimplementing their regex engine if I want this feature. I have no interest in doing that. I've already done it. It's boring.
To err is human. Fortune favors the monsters.
-
Not quite sure what you are after? Nothing stops you from implementing your own regex attributes and source generators - so what is it you are missing? Hooking your own engine into Microsoft's existing attributes?
Nothing. I'm missing nothing, since Microsoft implemented the feature in the OP.
To err is human. Fortune favors the monsters.
-
One advantage: invalid regex syntax is a compile time warning instead of a runtime error. I am not arguing this outweigh the disadvantages.
On the other hand, by the time the application gets to production, it will have been tested. It's only in development where a compile time error would be beneficial. Frequently, a Regular Expression will have been developed and tested in Expresso or similar, so I wouldn't say that a compiler error is that much of a benefit.