I have begun the long slow journey to redemption.
-
My Visual FA project was a bust, with broken benchmarks and too slow code. I know in theory a DFA regex can beat one that backtracks, but Microsoft has done a damned good job of optimizing their engine in recent years. My issue is they're using ReadOnlySpan over the input string so if I want to beat it I have to use that as well. That won't stream, but I can generate separate code for that. Here's my result so far Microsoft Regex compiled "Lexer": Found 170000 matches in 13ms FAStringRunner: Found 170000 matches in 10ms I can probably make it a little faster still. The FAStringRunner was created by hand, by me, not generated by a regular expression. It looks like this
q1:
if (((((ch >= 9)
&& (ch <= 10))
|| (ch == 13))
|| (ch == 32)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q1;
}
return FAMatch.Create(2, span.Slice(unchecked((int)p), len).ToString(), p, l, c);
q2:
if (((ch >= 49)
&& (ch <= 57)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q3;
}
goto errorout;I wrote this by hand because I needed to test to see if I could match faster than Microsoft before I wrote any generator code to do so. I've got a lot of code to write. Edit: Okay they're really challenging me here:
Microsoft Regex compiled "Lexer": Found 1440000 matches in 91ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 89ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 80ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 70ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 73ms
FAStringRunner: Found 1430000 matches in 79ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 72ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 m -
My Visual FA project was a bust, with broken benchmarks and too slow code. I know in theory a DFA regex can beat one that backtracks, but Microsoft has done a damned good job of optimizing their engine in recent years. My issue is they're using ReadOnlySpan over the input string so if I want to beat it I have to use that as well. That won't stream, but I can generate separate code for that. Here's my result so far Microsoft Regex compiled "Lexer": Found 170000 matches in 13ms FAStringRunner: Found 170000 matches in 10ms I can probably make it a little faster still. The FAStringRunner was created by hand, by me, not generated by a regular expression. It looks like this
q1:
if (((((ch >= 9)
&& (ch <= 10))
|| (ch == 13))
|| (ch == 32)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q1;
}
return FAMatch.Create(2, span.Slice(unchecked((int)p), len).ToString(), p, l, c);
q2:
if (((ch >= 49)
&& (ch <= 57)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q3;
}
goto errorout;I wrote this by hand because I needed to test to see if I could match faster than Microsoft before I wrote any generator code to do so. I've got a lot of code to write. Edit: Okay they're really challenging me here:
Microsoft Regex compiled "Lexer": Found 1440000 matches in 91ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 89ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 80ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 70ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 73ms
FAStringRunner: Found 1430000 matches in 79ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 72ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 m -
is the difference between 1440000 and 1430000 significant? Or is that part of the reason for early prototyping? Almost good enough? “Perfect is the enemy of good enough”
no. i noticed it after the fact. it's a one off error in my benchmark code, iterated 10,000 times. :laugh:
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
My Visual FA project was a bust, with broken benchmarks and too slow code. I know in theory a DFA regex can beat one that backtracks, but Microsoft has done a damned good job of optimizing their engine in recent years. My issue is they're using ReadOnlySpan over the input string so if I want to beat it I have to use that as well. That won't stream, but I can generate separate code for that. Here's my result so far Microsoft Regex compiled "Lexer": Found 170000 matches in 13ms FAStringRunner: Found 170000 matches in 10ms I can probably make it a little faster still. The FAStringRunner was created by hand, by me, not generated by a regular expression. It looks like this
q1:
if (((((ch >= 9)
&& (ch <= 10))
|| (ch == 13))
|| (ch == 32)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q1;
}
return FAMatch.Create(2, span.Slice(unchecked((int)p), len).ToString(), p, l, c);
q2:
if (((ch >= 49)
&& (ch <= 57)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q3;
}
goto errorout;I wrote this by hand because I needed to test to see if I could match faster than Microsoft before I wrote any generator code to do so. I've got a lot of code to write. Edit: Okay they're really challenging me here:
Microsoft Regex compiled "Lexer": Found 1440000 matches in 91ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 89ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 80ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 70ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 73ms
FAStringRunner: Found 1430000 matches in 79ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 72ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 m -
Doesn’t this piece of code have one set of brackets too many? Admittedly C type languages are not my forte so I’m probably wrong if (((ch >= 49) && (ch <= 57)))
The jump tables were copied from code I generated using an AST. It is valid code, even if there are spurious parentheses. They have no effect on the compilation or execution.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
My Visual FA project was a bust, with broken benchmarks and too slow code. I know in theory a DFA regex can beat one that backtracks, but Microsoft has done a damned good job of optimizing their engine in recent years. My issue is they're using ReadOnlySpan over the input string so if I want to beat it I have to use that as well. That won't stream, but I can generate separate code for that. Here's my result so far Microsoft Regex compiled "Lexer": Found 170000 matches in 13ms FAStringRunner: Found 170000 matches in 10ms I can probably make it a little faster still. The FAStringRunner was created by hand, by me, not generated by a regular expression. It looks like this
q1:
if (((((ch >= 9)
&& (ch <= 10))
|| (ch == 13))
|| (ch == 32)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q1;
}
return FAMatch.Create(2, span.Slice(unchecked((int)p), len).ToString(), p, l, c);
q2:
if (((ch >= 49)
&& (ch <= 57)))
{
++len;
if (position < span.Length - 1)
{
ch = span[unchecked((int)++position)];
}
else
{
ch = '\0';
}
goto q3;
}
goto errorout;I wrote this by hand because I needed to test to see if I could match faster than Microsoft before I wrote any generator code to do so. I've got a lot of code to write. Edit: Okay they're really challenging me here:
Microsoft Regex compiled "Lexer": Found 1440000 matches in 91ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 89ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 80ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 70ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 81ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 71ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 73ms
FAStringRunner: Found 1430000 matches in 79ms
Microsoft Regex compiled "Lexer": Found 1440000 matches in 72ms
FAStringRunner: Found 1430000 matches in 80ms
Microsoft Regex compiled "Lexer": Found 1440000 mGreetings and Kind Regards Am I mistaken or is the 1st
if
logically identical toif (ch == 9 || ch == 10 || ch == 13 || ch == 32)
? Thank You Kindly -
Greetings and Kind Regards Am I mistaken or is the 1st
if
logically identical toif (ch == 9 || ch == 10 || ch == 13 || ch == 32)
? Thank You Kindlyyes. The if conditions were generated by a tool, and it doesn't bother breaking up the range 9-10 into two single equality comparisons.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Doesn’t this piece of code have one set of brackets too many? Admittedly C type languages are not my forte so I’m probably wrong if (((ch >= 49) && (ch <= 57)))
Greetings and Kind Regards It can also be
if (c >= 49 && ch <= 57)
as relational operators have higher precedence than logical.