Ruminations on parsing.

honey the codewitch

There are. Chomsky type 1 and type 0 languages are context dependent. They model human language. You can represent them with an Earley grammar, but an Earley grammar is not practical to parse on a real system. It's strictly theoretical. It's also possible to parse context sensitive languages with a context free grammar using a GLR parser. You will get multiple trees back due to the ambiguity of such languages without context. Your job would then be to decide which tree is valid.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Mircea Neacsu

Sure, English is a prime example. Note the "prime" ambiguity that can only be solved in a context-dependent manner. Some might argue that English doesn't really have a grammar per se, but mostly a collection of use cases and exceptions. :laugh:

Mircea

Greg Utas

:thumbsup: To parse C++, I used recursive descent and even wrote the lexical analysis routines from scratch. Fixing bugs is fairly straightforward, but I wouldn't have a clue how to fix them in a bottom-up parser. It's about 13K lines of code, including comments and blanks, and anyone familiar with C++ can look at the code and probably figure it out with relative ease. The "compiler" part, however, which has to understand what the parsed code is doing in order to perform static analysis, is much larger, probably 3x that size. robust-services-core/src/ct/Parser.cpp · GitHub[^]

Robust Services Core | Software Techniques for Lemmings | Articles
The fox knows many things, but the hedgehog knows one big thing.

honey the codewitch

Yeah, in my Slang parser the parsing was the easy part. Establishing a CodeDOM based AST after the fact was not. For example

class Foo {
static public int Bar;
}

if I do Foo.Bar, do I create a CodeFieldReferenceExpression or a CodePropertyReferenceExpression, or even a CodeMethodReferenceExpression? I have to crawl the existing code I've already resolved to an AST to determine what Bar is, so I can then make the appropriate object. It's not compiling. It's doing something completely different, but it's very compiler-like.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

jschell

honey the codewitch wrote:

Umm, I do?

I meant that as a general comment for all of those out there that create their own language. I suspect that at least some of them don't use a BNF.

jschell

Interesting question. So I went looking. The following programming language claims that at least some of it is context dependent. Chapter 4 - Expressions[^] "Words such as sell, at, and read have different meanings in different contexts. The words are relative expressions -- their meaning is context dependent."

honey the codewitch

Oh I see. I thought you meant I was being inconsistent about calling a language a language. BNF and EBNF are just "file formats" for lack of a better term. They are like

Foo:= Bar "+" Baz;
Bar:= "Bar";
Baz:= { "Baz" }+;

(inexact representation as the specs are imprecise kind of like regex) it's just a format for a context free grammar specification. EBNF and BNF are the most well known formats, which is why I mentioned them.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

honey the codewitch

It's indeed possible to apply context to a narrow parse path. In fact, even the C grammar requires this because introducing a new struct or typedef introduces a new non-terminal into the grammar. It can be had by "hacking" the parser in one particular area such that it can apply a specific and narrow kind of context. That's the how the context is represented in that particular case. However, a generalized mechanism for context is not really feasible. Chomsky Type 1 and 0 languages require context throughout in order to parse. They need something like an Earley grammar. Implementations of Earley grammars that have been proposed write new context-free-grammars on the fly during the parse. The problem with that is it takes a long time to turn a CFG into an actual parser. Generating the tables takes a lot of time. It's simply not practical. There are better ways of language processing that don't use this tech at all. See AI speech recognition. Edit: Looks like someone attacked it a different way as well: https://www.sciencedirect.com/science/article/pii/S2590118422000697[^] (paywalled)

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

jschell

honey the codewitch wrote:

Looks like someone attacked it a different way as wel

Interesting. Goes beyond what you said to point out that multiple languages are a mix. Although perhaps that is not surprising. From the link you posted. "Furthermore, despite the strength of CFGs, some aspects of modern programming languages cannot be modeled with context-free grammars alone as some language constructs depend on the wider context they appear in [12]. Most such cases must be dealt with during or after parsing using more or less formalized techniques, e.g., name resolution, type checking, etc., which are far less formalized than the parser itself. "

jschell

honey the codewitch wrote:

BNF and EBNF are just

Yes. Like you I have created my own languages in the past. One time formally. But most times just adhoc.