There are many gotos, but these ones are mine

glennPattonPub

Try writing Assembly with out them (the fabled JMP!). They are a tool that get misused (kinda like the powered screw driver).

Payton Byrd 2023

Code runs in LinqPad. Code runs in LinqPad. This should be significantly faster than your original code because it speeds up the conditionals by using pattern matching instead of overloadable operators. Also, the local functions can be in-lined, meaning they will be executed in place, which is even more efficient than the `Goto` statements. And now it's not pure spaghetti.

string json = """
{
  "test": 0,
  "data": "value"
}
""";

JsonStringRunner runner = new();

List matches = new();
FAMatch current = default;
Stopwatch sw = new();
sw.Start();
do{
    current = runner.GetMatch(json);
    matches.Add(current);
} while(!runner.isDone);
sw.Stop();
matches.Dump();
sw.Dump();

internal record struct FAMatch(int token, string match, int position, int length, int column)
{
    internal static FAMatch Create(int token, string match, int position, int length, int column)
        => new(token, match, position, length, column);
}

internal abstract class FAStringRunner
{
    protected int position = -1, line = 0, column = 0;
    internal bool isDone = false;
}

internal sealed partial class JsonStringRunner : FAStringRunner
{
    private void Advance(string s, ref int ch, ref int len, bool flag)
    {
        // Assuming Advance takes consecutive characters in the string.
        ch = s\[position\];
        position++;
        len++;
        isDone = !(position < s.Length);
    }
    private FAMatch NextMatchImpl(string s)
    {
        int ch;
        int len;
        int l;
        int c;
        ch = -1;
        len = 0;
        if ((this.position is -1))
        {
            this.position = 0;
        }
        int p = this.position;
        l = this.line;
        c = this.column;
        this.Advance(s, ref ch, ref len, true);
        // q0:
        switch (ch)
        {
            // \[\\t-\\n\\r \]
            case 9 or 10 or 13 or 32:
                if(ch is 10 or 13){
                    l = line++;
                }
                return q1();
            // \[\\"\]
            case 34:
                return q2();
            // \[,\]
            case 44:
                return q9();
            // \[\\-\]
            case

honey the codewitch

I hate function pointer dispatch code in general. Because at some point you'll have to debug and maintain it, and you end up with impossible to follow pointer arrays hiding the flow of your app.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

honey the codewitch

I'll have to try a variation of this, but what you produced won't function due to the returns. How are you going to loop?

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Payton Byrd 2023

Without the full code I didn't know what the logic inside of the various labelled location did, so I simply returned the current substring as a FAMatch. Your method dumps out as an FAMatch so I defaulted to that behavior. The point is that inlined local methods are going to be just as fast as gotos and the pattern matching is much more efficient.

jochance

I'm not sure why it wouldn't be pretty straightforward to [TestCase()] for each of the branching? I don't think this code is very cyclomatically complex? But yeah when you say table driven state machine I'm pretty sure that's where my head is too if you're basically talking a direct map of the case statements to data.

giulicard

honey the codewitch wrote:

with impossible to follow pointer arrays

There are no pointer arrays in my code.

honey the codewitch

Sorry, I was speaking generally about dispatch function pointers. Your statement just remind me of it. Sorry I wasn't clear. I just woke up when I wrote that. :)

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

giulicard

No problem. I'm not a native English speaker so I always fear being misunderstood.

honey the codewitch

Sure, I understand. I did say it was a DFA state machine implementation but unless you're a total FA nerd like I am that probably doesn't mean anything. :) I'm very curious about the inlined local method and pattern matching approach, particularly the IL it generates, because I don't understand how it would be faster than the IL my code produces - particularly my direct compiler which can short circuit the if tests because the comparisons are in sorted order.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

honey the codewitch

There is one issue with that. The compiled ones can be augmented in a way that the table driven ones cannot. For example, I wrote an embedded JSON pull parser in C++. I used compiled DFA code, and then I parsed floats, ints, and bools out of the stream *as* I was lexing, making the API both easier to use and marginally more performant because you didn't have to get the string back and then reexamine it in order to call atoi() or whatever. It was a simple little surgery on the generated code, with excellent results. I admit this isn't the most common case out there, but I have used this technique several times. Edited to add: It's also easier in practice to debug and step through a generated lexer than it is a table driven lexer. And with my Visual FA project, it produces images of directed graphs that map one to one to the labels/jump points in the code. q0: maps to the state q0 in the graph. It makes it really easy to see what it's doing, in terms of documenting it.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Daniel Will

Break, continue, and return are basically goto, when translated to low level machine codes :thumbsup: Also for-loop, if-else, while-do, switch, etc. Gotos are frowned because some people used it badly. Maybe they caused infinite loop or something. Maybe they forgot to free the allocated memory. Also it shouldn't be used when your high level language provides more explanatory keywords above. The reason is obviously, for maintainability and readibility purpose.

trønderen

honey the codewitch wrote:

I hate function pointer dispatch code in general.

Do you refuse to use delegates at all, or don't you consider those to be function pointers? (In other words: Are function pointers OK as long as they are called delegates?) No, when you have generated your code, you do not "at some time have to debug and maintain" the generated code. You debug and maintain your source, not the compilation result. Not even if you can, sort of, read it. Executable binaries can also be disassembled into "readable" code - the readability is no argument for random peek and poke. You send your code through a generator/compiler, and want to patch up the complied result ("The compiled ones can be augmented in a way that the table driven ones cannot"), or complain about the instructions generated by the compiler - I haven't heard anyone saying any such thing in earnest for a decade or two. Some people still believe that they can do smarter heap management than the standard heap manager, rejecting automated garbage collection and smart pointers, but for the most part, compilers became smarter than human coders in the last millennium. You will see a lot of function pointer dispatch code in the generated code from a plain C++ compiler. Do you hate that as well? If you accept it from a C++ compiler, why do you have problems accepting it from other compilers? (The first C++ compiler I used didn't produce binary code - it was a machine independent compiler producing K&R C to be fed into a machine specific compiler. So we had full access to the C code for patching it up before passing it on to cc. We did not. I would not do it with any generated code, whether the compiler is called C++ or Visual FA.)

Religious freedom is the freedom to say that two plus two make five.

honey the codewitch

I was going to respond, but I think I answered all this in the post you responded to

Because at some point you'll have to debug and maintain it, and you end up with impossible to follow pointer arrays hiding the flow of your app.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

trønderen

The real issue is:

honey the codewitch wrote:

at some point you'll have to debug and maintain it

Does that apply to the code generated by your C, C++ or C# compiler as well? When are you going to start trusting your tools to do at least as good a job as the one you are doing yourself? I think: If you don't trust your tools to do a good enough job, throw them away and do the job yourself!

Religious freedom is the freedom to say that two plus two make five.

honey the codewitch

It does not typically apply to generated code because the maintenance of that is moved to the generated code's input specification - in other words, whatever document or resource it uses to generate the code from. THAT is what needs to be maintained. It does not apply to compiled code either, for exactly the same reason (the compiler being yet another code generator)

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

jschell

Seems likely that would be faster with an array look up versus those sequential ifs.

if (match[ch])
...

honey the codewitch wrote:

if (((((ch >= 9) && (ch <= 10)) || (ch == 13)) || (ch == 32))) {

Seems unlikely that that would be better than

(ch == 9) || (ch == 10) || (ch == 13) || (ch == 32))

honey the codewitch

What's funny is my table driven code does exactly that. Sometimes I get different results depending on the lexer complexity, but for simple lexers at least the compiled versions run slightly faster. With large lexers the table method starts to outstrip it. I should note, the lexer size has nothing to do with the number of comparisons in those ifs - but rather in essense the number of ifs - really the number of goto labels.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

trønderen

But if the code is generated by Visual FA rather than cc, then you will do peek and poke on the generated code. Well, that is choice. I think you are on the wrong track. In the 1980s, I worked in a company distributing OS patches as Poke instructions. I wouldn't condone that practice today.

Religious freedom is the freedom to say that two plus two make five.

honey the codewitch

then you will do peek and poke on the generated code.

I will? That's news to me. Hell, with VisualFA.SourceGenerator you don't even see the generated code. It's hidden by visual studio.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix