Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. The Lounge
  3. Bottom Up Parsing is Frustrating

Bottom Up Parsing is Frustrating

Scheduled Pinned Locked Moved The Lounge
helpregexjsonalgorithmsdata-structures
5 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • honey the codewitchH Offline
    honey the codewitchH Offline
    honey the codewitch
    wrote on last edited by
    #1

    The way it works is it takes a series of tokens/lexemes from an input stream and uses pattern matching to discern whether they match various rules like: Foobar -> "foo" "bar" Such that when it sees "foo" followed by "bar" in the input stream it knows the result is a Foobar The upside of this is it's got a whole lot more recognizing power than starting from the point of Foobar and then checking the input stream for "foo" followed by "bar" (basically the reverse of the aforementioned process) The downside is it means the tree is built from leaves to the root, meaning bottom up, meaning your recursive tree building is effectively "inside out", building the children first. That's not normally a problem but it has some important ramifications, like if there's an error token in the tree it can't readily resolve the rest of the tree once encountered. The other issue is it's more confusing from the standpoint of the end user of the parser. Sure once it's in the tree, it's all the same, but if you're using a pull parser rather than loading everything into an in memory tree, the pull parser for a bottom up parser is just weird to use. And then the primary issue I have with them is that the algorithms to construct the parse tables are incomprehensible. I've written them before and I still hesitate to say I understand them. I have code that does it that I won't touch because of it. It's not the fault of the code - it's the algorithm that's ugly. The math is atrocious - it feels as though designed while drunk except that it's so bloody clever - too clever to understand by mere mortals. :sigh: I don't like it. Unfortunately it's a very practical way to parse, so it can't really be overlooked.

    Real programmers use butterflies

    Greg UtasG 1 Reply Last reply
    0
    • honey the codewitchH honey the codewitch

      The way it works is it takes a series of tokens/lexemes from an input stream and uses pattern matching to discern whether they match various rules like: Foobar -> "foo" "bar" Such that when it sees "foo" followed by "bar" in the input stream it knows the result is a Foobar The upside of this is it's got a whole lot more recognizing power than starting from the point of Foobar and then checking the input stream for "foo" followed by "bar" (basically the reverse of the aforementioned process) The downside is it means the tree is built from leaves to the root, meaning bottom up, meaning your recursive tree building is effectively "inside out", building the children first. That's not normally a problem but it has some important ramifications, like if there's an error token in the tree it can't readily resolve the rest of the tree once encountered. The other issue is it's more confusing from the standpoint of the end user of the parser. Sure once it's in the tree, it's all the same, but if you're using a pull parser rather than loading everything into an in memory tree, the pull parser for a bottom up parser is just weird to use. And then the primary issue I have with them is that the algorithms to construct the parse tables are incomprehensible. I've written them before and I still hesitate to say I understand them. I have code that does it that I won't touch because of it. It's not the fault of the code - it's the algorithm that's ugly. The math is atrocious - it feels as though designed while drunk except that it's so bloody clever - too clever to understand by mere mortals. :sigh: I don't like it. Unfortunately it's a very practical way to parse, so it can't really be overlooked.

      Real programmers use butterflies

      Greg UtasG Offline
      Greg UtasG Offline
      Greg Utas
      wrote on last edited by
      #2

      That's why this[^] uses a recursive descent parser. It's not as efficient as bottom-up, but it's decent enough if the possibilities are tried in probabilistic order, especially in this day and age of code that uses CPU time and memory frivolously. The code is the grammar and is usually easy to fix when a bug is found. No need to get the grammar close to right from the outset, and no need for an inscrutable algorithm to generate parse tables. Skipping over errors can still be difficult, but it depends where they occur.

      Robust Services Core | Software Techniques for Lemmings | Articles
      The fox knows many things, but the hedgehog knows one big thing.

      <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
      <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

      honey the codewitchH 1 Reply Last reply
      0
      • Greg UtasG Greg Utas

        That's why this[^] uses a recursive descent parser. It's not as efficient as bottom-up, but it's decent enough if the possibilities are tried in probabilistic order, especially in this day and age of code that uses CPU time and memory frivolously. The code is the grammar and is usually easy to fix when a bug is found. No need to get the grammar close to right from the outset, and no need for an inscrutable algorithm to generate parse tables. Skipping over errors can still be difficult, but it depends where they occur.

        Robust Services Core | Software Techniques for Lemmings | Articles
        The fox knows many things, but the hedgehog knows one big thing.

        honey the codewitchH Offline
        honey the codewitchH Offline
        honey the codewitch
        wrote on last edited by
        #3

        You get most of those same advantages using LL(1). Besides, if your grammar language is easy enough, grammars aren't so bad. Here's one for JSON:

        Json= Object | Array;
        Object= "{" [ Field { "," Field } ] "}";
        Field= String ":" Value;
        Array= "[" [ Value { "," Value } ] "]";
        Value= String |
        Number |
        Object |
        Array |
        Boolean |
        Null ;

        Number= '\-?(0|[1-9][0-9]*)(\.[0-9]+)?([Ee][\+\-]?[0-9]+)?';
        String = '"([^\0-\x1F"\\]|\\([\\/bfnrt]|u[a-fA-F0-9]{4}))*"';
        Boolean = 'true|false';
        Null="null";
        Lbracket="[";
        Rbracket="]";
        Lbrace="{";
        Rbrace="}";
        Colon=":";
        Comma=",";
        Whitespace='[ \t\r\n\f\v]+';

        See? no thing. All the attributes in there are just to prune the parse tree. "collapsed" removes items from the parse tree once they are parsed so you don't have to deal with them in your code.

        Real programmers use butterflies

        Greg UtasG 1 Reply Last reply
        0
        • honey the codewitchH honey the codewitch

          You get most of those same advantages using LL(1). Besides, if your grammar language is easy enough, grammars aren't so bad. Here's one for JSON:

          Json= Object | Array;
          Object= "{" [ Field { "," Field } ] "}";
          Field= String ":" Value;
          Array= "[" [ Value { "," Value } ] "]";
          Value= String |
          Number |
          Object |
          Array |
          Boolean |
          Null ;

          Number= '\-?(0|[1-9][0-9]*)(\.[0-9]+)?([Ee][\+\-]?[0-9]+)?';
          String = '"([^\0-\x1F"\\]|\\([\\/bfnrt]|u[a-fA-F0-9]{4}))*"';
          Boolean = 'true|false';
          Null="null";
          Lbracket="[";
          Rbracket="]";
          Lbrace="{";
          Rbrace="}";
          Colon=":";
          Comma=",";
          Whitespace='[ \t\r\n\f\v]+';

          See? no thing. All the attributes in there are just to prune the parse tree. "collapsed" removes items from the parse tree once they are parsed so you don't have to deal with them in your code.

          Real programmers use butterflies

          Greg UtasG Offline
          Greg UtasG Offline
          Greg Utas
          wrote on last edited by
          #4

          I'd never seen a JSON grammar, so thanks for that. :thumbsup: But the one I'm dealing with is C++, which is surely a dog's breakfast. :)

          Robust Services Core | Software Techniques for Lemmings | Articles
          The fox knows many things, but the hedgehog knows one big thing.

          <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
          <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

          honey the codewitchH 1 Reply Last reply
          0
          • Greg UtasG Greg Utas

            I'd never seen a JSON grammar, so thanks for that. :thumbsup: But the one I'm dealing with is C++, which is surely a dog's breakfast. :)

            Robust Services Core | Software Techniques for Lemmings | Articles
            The fox knows many things, but the hedgehog knows one big thing.

            honey the codewitchH Offline
            honey the codewitchH Offline
            honey the codewitch
            wrote on last edited by
            #5

            Yeah, for C++ for a number of reasons it's best to use a hand written parser.

            Real programmers use butterflies

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups