Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Ruminations on parsing.

Ruminations on parsing.

Scheduled Pinned Locked Moved The Lounge
designcomgraphicsiotalgorithms
20 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Richard Andrew x64R Richard Andrew x64

    All this talk of Context-Free Grammars has me wondering if there's such thing as a context-dependent grammar?

    The difficult we do right away... ...the impossible takes slightly longer.

    H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #11

    There are. Chomsky type 1 and type 0 languages are context dependent. They model human language. You can represent them with an Earley grammar, but an Earley grammar is not practical to parse on a real system. It's strictly theoretical. It's also possible to parse context sensitive languages with a context free grammar using a GLR parser. You will get multiple trees back due to the ambiguity of such languages without context. Your job would then be to decide which tree is valid.

    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

    1 Reply Last reply
    0
    • Richard Andrew x64R Richard Andrew x64

      All this talk of Context-Free Grammars has me wondering if there's such thing as a context-dependent grammar?

      The difficult we do right away... ...the impossible takes slightly longer.

      Mircea NeacsuM Offline
      Mircea NeacsuM Offline
      Mircea Neacsu
      wrote on last edited by
      #12

      Sure, English is a prime example. Note the "prime" ambiguity that can only be solved in a context-dependent manner. Some might argue that English doesn't really have a grammar per se, but mostly a collection of use cases and exceptions. :laugh:

      Mircea

      1 Reply Last reply
      0
      • H honey the codewitch

        Probably a lot of you did this in school: You take a context-free-grammar in some variant of BNF or EBNF and you use it to generate parse tables you can use to parse structured text, like programming languages. After studying it off and on for years, teaching myself the concepts, building code generators, and using parser code generators already out there, I've come to the following conclusions: 1. For any non-LL(1) language of non-trivial complexity - say, a programming language for example, it is virtually always worth it to hand roll your own parser, as the code can be as much as an order of magnitude smaller, and more flexible, which since even languages with simplistic syntax like C still require dynamic introduction of symbols into the parse table while parsing, is pretty much required to parse something real world that is more than simple. 2. Even given #1 it may be worth it to use generated code to test hand rolled code, and to create a context free grammar to describe that language anyway. A grammar coded by a parser generator can test your hand rolled parser for correctness, and that CFG (grammar) can be used to document it. 3. Despite its power, bottom up parsing is not as elegant as top down parsing, and also #1 still applies, and you cannot realistically make a bottom up parser without generating it. 4. I spent a long time to come up with the above 3 little points. It was expensive, even as experience goes. I'm still trying to decide if it was worth it, all told, when I factor in how much computer science I learned in the process. I didn't get saddled financially for it though, so yay for that.

        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

        Greg UtasG Offline
        Greg UtasG Offline
        Greg Utas
        wrote on last edited by
        #13

        :thumbsup: To parse C++, I used recursive descent and even wrote the lexical analysis routines from scratch. Fixing bugs is fairly straightforward, but I wouldn't have a clue how to fix them in a bottom-up parser. It's about 13K lines of code, including comments and blanks, and anyone familiar with C++ can look at the code and probably figure it out with relative ease. The "compiler" part, however, which has to understand what the parsed code is doing in order to perform static analysis, is much larger, probably 3x that size. robust-services-core/src/ct/Parser.cpp · GitHub[^]

        Robust Services Core | Software Techniques for Lemmings | Articles
        The fox knows many things, but the hedgehog knows one big thing.

        <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
        <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

        H 1 Reply Last reply
        0
        • Greg UtasG Greg Utas

          :thumbsup: To parse C++, I used recursive descent and even wrote the lexical analysis routines from scratch. Fixing bugs is fairly straightforward, but I wouldn't have a clue how to fix them in a bottom-up parser. It's about 13K lines of code, including comments and blanks, and anyone familiar with C++ can look at the code and probably figure it out with relative ease. The "compiler" part, however, which has to understand what the parsed code is doing in order to perform static analysis, is much larger, probably 3x that size. robust-services-core/src/ct/Parser.cpp · GitHub[^]

          Robust Services Core | Software Techniques for Lemmings | Articles
          The fox knows many things, but the hedgehog knows one big thing.

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #14

          Yeah, in my Slang parser the parsing was the easy part. Establishing a CodeDOM based AST after the fact was not. For example

          class Foo {
          static public int Bar;
          }

          if I do Foo.Bar, do I create a CodeFieldReferenceExpression or a CodePropertyReferenceExpression, or even a CodeMethodReferenceExpression? I have to crawl the existing code I've already resolved to an AST to determine what Bar is, so I can then make the appropriate object. It's not compiling. It's doing something completely different, but it's very compiler-like.

          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

          1 Reply Last reply
          0
          • H honey the codewitch

            jschell wrote:

            If you are going to call it a language then you probably really must do that.

            Umm, I do? Lexicon: Context-Free-Grammar/CFG - The document describing the structure of the language Language - A Chomsky type 2 language describable with a CFG Parser - A stack based FA machine that can - given an input grammar - parse the corresponding language. Maybe you just didn't understand me or something.

            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #15

            honey the codewitch wrote:

            Umm, I do?

            I meant that as a general comment for all of those out there that create their own language. I suspect that at least some of them don't use a BNF.

            H 1 Reply Last reply
            0
            • Richard Andrew x64R Richard Andrew x64

              All this talk of Context-Free Grammars has me wondering if there's such thing as a context-dependent grammar?

              The difficult we do right away... ...the impossible takes slightly longer.

              J Offline
              J Offline
              jschell
              wrote on last edited by
              #16

              Interesting question. So I went looking. The following programming language claims that at least some of it is context dependent. Chapter 4 - Expressions[^] "Words such as sell, at, and read have different meanings in different contexts. The words are relative expressions -- their meaning is context dependent."

              H 1 Reply Last reply
              0
              • J jschell

                honey the codewitch wrote:

                Umm, I do?

                I meant that as a general comment for all of those out there that create their own language. I suspect that at least some of them don't use a BNF.

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #17

                Oh I see. I thought you meant I was being inconsistent about calling a language a language. BNF and EBNF are just "file formats" for lack of a better term. They are like

                Foo:= Bar "+" Baz;
                Bar:= "Bar";
                Baz:= { "Baz" }+;

                (inexact representation as the specs are imprecise kind of like regex) it's just a format for a context free grammar specification. EBNF and BNF are the most well known formats, which is why I mentioned them.

                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                J 1 Reply Last reply
                0
                • J jschell

                  Interesting question. So I went looking. The following programming language claims that at least some of it is context dependent. Chapter 4 - Expressions[^] "Words such as sell, at, and read have different meanings in different contexts. The words are relative expressions -- their meaning is context dependent."

                  H Offline
                  H Offline
                  honey the codewitch
                  wrote on last edited by
                  #18

                  It's indeed possible to apply context to a narrow parse path. In fact, even the C grammar requires this because introducing a new struct or typedef introduces a new non-terminal into the grammar. It can be had by "hacking" the parser in one particular area such that it can apply a specific and narrow kind of context. That's the how the context is represented in that particular case. However, a generalized mechanism for context is not really feasible. Chomsky Type 1 and 0 languages require context throughout in order to parse. They need something like an Earley grammar. Implementations of Earley grammars that have been proposed write new context-free-grammars on the fly during the parse. The problem with that is it takes a long time to turn a CFG into an actual parser. Generating the tables takes a lot of time. It's simply not practical. There are better ways of language processing that don't use this tech at all. See AI speech recognition. Edit: Looks like someone attacked it a different way as well: https://www.sciencedirect.com/science/article/pii/S2590118422000697[^] (paywalled)

                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                  J 1 Reply Last reply
                  0
                  • H honey the codewitch

                    It's indeed possible to apply context to a narrow parse path. In fact, even the C grammar requires this because introducing a new struct or typedef introduces a new non-terminal into the grammar. It can be had by "hacking" the parser in one particular area such that it can apply a specific and narrow kind of context. That's the how the context is represented in that particular case. However, a generalized mechanism for context is not really feasible. Chomsky Type 1 and 0 languages require context throughout in order to parse. They need something like an Earley grammar. Implementations of Earley grammars that have been proposed write new context-free-grammars on the fly during the parse. The problem with that is it takes a long time to turn a CFG into an actual parser. Generating the tables takes a lot of time. It's simply not practical. There are better ways of language processing that don't use this tech at all. See AI speech recognition. Edit: Looks like someone attacked it a different way as well: https://www.sciencedirect.com/science/article/pii/S2590118422000697[^] (paywalled)

                    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                    J Offline
                    J Offline
                    jschell
                    wrote on last edited by
                    #19

                    honey the codewitch wrote:

                    Looks like someone attacked it a different way as wel

                    Interesting. Goes beyond what you said to point out that multiple languages are a mix. Although perhaps that is not surprising. From the link you posted. "Furthermore, despite the strength of CFGs, some aspects of modern programming languages cannot be modeled with context-free grammars alone as some language constructs depend on the wider context they appear in [12]. Most such cases must be dealt with during or after parsing using more or less formalized techniques, e.g., name resolution, type checking, etc., which are far less formalized than the parser itself. "

                    1 Reply Last reply
                    0
                    • H honey the codewitch

                      Oh I see. I thought you meant I was being inconsistent about calling a language a language. BNF and EBNF are just "file formats" for lack of a better term. They are like

                      Foo:= Bar "+" Baz;
                      Bar:= "Bar";
                      Baz:= { "Baz" }+;

                      (inexact representation as the specs are imprecise kind of like regex) it's just a format for a context free grammar specification. EBNF and BNF are the most well known formats, which is why I mentioned them.

                      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                      J Offline
                      J Offline
                      jschell
                      wrote on last edited by
                      #20

                      honey the codewitch wrote:

                      BNF and EBNF are just

                      Yes. Like you I have created my own languages in the past. One time formally. But most times just adhoc.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups