oh what a mess I've gotten myself into

honey the codewitch

So this is the kind of code I have to completely remake. This is part of convoluted parser that's not even recursive descent, so it's next to impossible to debug. It's generated and table driven, calling helper methods like this ugly thing below:

internal static RuleDesc MkDummyRuleDesc( LexCategory cat, AAST aast ) {
RuleDesc result = new RuleDesc();
result.pSpan = null;
result.aSpan = aast.AtStart;
result.isBarAction = false;
result.isPredDummyRule = true;
result.pattern = String.Format( CultureInfo.InvariantCulture, "{{{0}}}", cat.Name );
result.list = new List();
result.ParseRE( aast );
result.list.Add( aast.StartStateValue( cat.PredDummyName ) );
return result;
}

I've included all the comments. I have no idea what this does, or why dummy rules would have to be made in the first place. I'm not even clear on what a rule is, although I have a vague idea. This code has no documentation. At best, it's mostly a reimplementation of lex in C#, so I kind of know what inputs it accepts from the lex man pages. It has flex extensions too though but not all of them. Who knows what it supports? I'm not even sure the original author does and the primary engine hasn't been updated in 6 years. :sigh: Even if i get this working how I want it will probably always be C# only unless i want to debug slang enough to get it to work with it or retool all of the code generation to use the codedom by hand. Ugh. And that's not even the worst bit. I have half a mind to leave the parser in place, preparse my desired document format, and then write out a document to this parser spec format in memory and then feed it to it that way, but what a nasty mess!

Real programmers use butterflies

Greg Utas

That's why I wrote a parser and lexer from scratch. Sure, code generation helps, but it just shifts the problem to one of getting the grammar right and fixing it for unusual cases. Ever looked at the C++ quasi-grammar sprinkled throughout cppreference.com[^]? I'd hardly know where to begin with such shite. Of course, it's not the fault of that site, which I consult frequently. It's probably inevitable when a language continually evolves while being reluctant to deprecate anything.

honey the codewitch

I mean, my previous version of Rolex used my own hand rolled parser, but this version is using the Gplex engine and I want to keep the regex syntax the same as Gplex - that and it's near impossible to build up the regex trees for gplex on my own - the trees are so convoluted that i'm basically stuck using the parser they gave me. I found that the regex parsing part - that subset, is handrolled recursive descent so that helps at least, but it's still ugly.

Real programmers use butterflies

RugbyLeague

Have you considered using Vaughan Pratt’s top-down operator precedence parsing?

honey the codewitch

I'm lexing, not parsing. The key here is speed, and that means stackless DFA. Top down is easy, I've already built a zillion top down parsers including Parsley: A Recursive Descent Parser Generator in C#[^] But I'm just looking for an efficient DFA lexer engine that handles unicode. GPLEX does it but the output is ugly and multi-file, and the input doc is fugly and looks like a lex spec, so I'm gutting gplex and changing the input and outputs but keeping the engine, if that makes sense. Right now I've got the output where i like it but I'm working on changing the input spec.

Real programmers use butterflies

RugbyLeague

Why do you think a stackless DFA is faster?

honey the codewitch

simple. less work to do

Real programmers use butterflies