Microsoft Regex Weirdness

Brisingr Aerowing

I just tested with .NET 8, and the :Whatever: syntax works perfectly fine.

What do you get when you cross a joke with a rhetorical question? The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism. Do questions with multiple question marks annoy you???

PIEBALDconsult

OK, I've never seen it and I don't see it documented.

honey the codewitch

They are Unicode Character classes. They match the static methods on char in C#. They're supported on all major unicode regex engines i'm aware of as is [:characterclass:] on all posix engines

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Peter_in_2780

In the days of 7/8 bit chars, those class tests were often implemented in bitmaps (e.g. 8 classes in an array of 256 bytes) A similar trick in, say, UTF-16 wouldn't be outrageous these days.

Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

PIEBALDconsult

honey the codewitch wrote:

They match the static methods on char in C#.

Then I hope they don't call those static methods on each character as they go. P.S. Maybe try changing it to the [\p{name}] form and compare?

honey the codewitch

Probably not. I have a 688KB C# file with all of the supported character classes and codepoint ranges. Unicode is big. I imagine they have something similar. As far as [\p{name}] I am vague on that form of expression but isn't it Unicode? The unicode one is already faster. The curious bit is ascii. I suppose I could try [:alnum:]

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

PIEBALDconsult

Right, as far as I can tell [\p{name}] == [:name:] . But do they perform the same? They should.

honey the codewitch

I'll find out when I get a chance. Just based on the way I parse this stuff I'm assuming it will be the same.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Richard Deeming

Are you using the source generators[^]? That should let you dig into the actual regex code to see the difference. :)

"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

Brisingr Aerowing

OK. That's actually pretty cool.

What do you get when you cross a joke with a rhetorical question? The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism. Do questions with multiple question marks annoy you???