Either I'm missing something or .NET is

honey the codewitch

Pretty much! :thumbsup:

Real programmers use butterflies

honey the codewitch

I've encountered a weird one. But I don't remember if it was a surrogate or not. I just remember it didn't print to the console properly (it cooked it) or didn't save to source as a literal value or something. It came up as whitespace, but only when I used these huge "not ranges" which are like [^a-z] (anything but a lower case letter) that "anything" part created ranges all throughout the 16-bit unicode spectrum. And that's when I ran into issues with one whitespace character.

Real programmers use butterflies

Greg Utas

And sixbit on DEC's PDP systems!

Dr Walt Fair PE

Actually Baudot it sufficient at 5 bits. CQ de W5ALT

Walt Fair, Jr.PhD P. E. Comport Computing Specializing in Technical Engineering Software

BillWoodruff

idea: research using Unicode categories in your RegEx ? [^]

«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali

honey the codewitch

I am using those. The category for surrogates is surrogate. Not helpful. Combining a hi and lo surrogate you get a 2 char string. The 2 char string cannot be queried for its unicode category in .NET AFAIK

Real programmers use butterflies

BillWoodruff

honey the codewitch wrote:

The 2 char string cannot be queried for its unicode category in .NET AFAIK

It is a mess, but, check this against what you expect, now:

public void PrintUniCodeRange(int sc, int ec)
{
bool isKey;

string key = "";

for (int i = sc; i <= ec; i++)
{
    string ucString = char.ConvertFromUtf32(i);
    
    isKey = i < 256;

    if (isKey) key = ((Keys)Enum.Parse(typeof(Keys), i.ToString())).ToString();

    UnicodeCategory cat = Char.GetUnicodeCategory(ucString, 0);

    if (cat != UnicodeCategory.OtherNotAssigned)
    {
        Console.WriteLine($"#{i} | Unicode Category: {cat} {(isKey ? "! Keys Enum: " + key : "")}");
    }
}

}

Calling the above with 8192 to 8233 parameters:

honey the codewitch

hmm, I wonder what my test was doing wrong, because GetUnicodeCategory(string, int) was returning only single char values for me i thought. maybe i had a bug

Real programmers use butterflies

honey the codewitch

Thank you! Turns out there was a bug in my code where i wasn't passing doublechar strings in. They ended up single char.

Real programmers use butterflies

User 2893688

Will someone please think about the children. [https://tenor.com/FJmS.gif\](https://tenor.com/FJmS.gif)