Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Woohoo, peephole parsing big content

Woohoo, peephole parsing big content

Scheduled Pinned Locked Moved The Lounge
designcomgraphicsiotjson
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.

    The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to read() Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!

    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

    B M 2 Replies Last reply
    0
    • H honey the codewitch

      I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.

      The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to read() Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!

      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

      B Offline
      B Offline
      BernardIE5317
      wrote on last edited by
      #2

      I would be interested in learning of the solution technique.

      H 1 Reply Last reply
      0
      • H honey the codewitch

        I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.

        The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to read() Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!

        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

        M Offline
        M Offline
        Matthew Dennis
        wrote on last edited by
        #3

        I agree. I’ve done some amazingly complex code on extremely resource limited microprocessors with state machines. They are compact and very fast.

        "Mistakes are prevented by Experience. Experience is gained by making mistakes."

        1 Reply Last reply
        0
        • B BernardIE5317

          I would be interested in learning of the solution technique.

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #4

          Here's my float routine. It uses my ml_reader markup peephole parser Basically I keep a running cursor over the current buffer (**current) as well as the rdr for when I need to fetch the next string. The rest is just state machine stuff.

          result_t parse_float(ml_reader_base& rdr, const char** current, float* result) {
          char* end = NULL;
          double res = 0.0, sign = 1.0;
          long long intPart = 0, fracPart = 0;
          int fracCount = 0;
          long expPart = 0;
          char expNeg = 0;
          char hasIntPart = 0, hasFracPart = 0, hasExpPart = 0;
          int state = 0;
          // Parse optional sign
          if (**current == '+') {
          (*current)++;
          } else if (**current == '-') {
          sign = -1;
          (*current)++;
          }

          while (state<7) {
              if (\*\*current) {
                  switch (state) {
                      case 0: // int part
                          if (!isdigit(\*\*current)) {
                              state = 1;
                              break;
                          }
                          hasIntPart=1;
                          intPart = (intPart\*10)+(\*\*current-'0');
                          ++(\*current);
                          break;
                      case 1:
                          \*result = (float)intPart;
                          if(\*\*current!='.') {
                              state = 3;
                              break;
                          }
                          ++(\*current);
                          state = 2;
                          break;
                      case 2: // frac part
                          if (!isdigit(\*\*current)) {
                              state = 3;
                              break;
                          }
                          ++fracCount;
                          hasFracPart=1;
                          fracPart = (fracPart\*10)+(\*\*current-'0');
                          ++(\*current);
                          break;
                      case 3:
                          if(hasFracPart) {
                              \*result += (double)fracPart/pow(10.0,(double)fracCount);
                          }
                          if(\*\*current=='E' || \*\*current=='e') {
                              ++(\*current);
                              state = 4;
                          } else {
                              state = 6;
                          }
                          break;
                      case 4:
                          if(\*\*current=='+') {
                              ++(\*current);
                          }
                          if(\*\*current=='-') {
                              expNeg = 1;
                              ++(\*current);
                          }
          
          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups