Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. I'm feeling "diverse" today

I'm feeling "diverse" today

Scheduled Pinned Locked Moved The Lounge
regexjsoncsharpc++iot
28 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H honey the codewitch

    I've written a JSON parser that can work in 4k. The flash size is too big. On the other hand, see this: My code[^] When you say smaller it sounds like you're talking about memory requirements. I'm only using 20% of my heap. I'm using almost all my flash space.

    To err is human. Fortune favors the monsters.

    K Offline
    K Offline
    klinkenbecker
    wrote on last edited by
    #7

    Yes, that is way too big :) Specifically, I'm not sure the exact size of the parser, but we routinely order map entries for size and JSON never gets anywhere near the top. Top flash hogs are radio (3k), print engine (2k), events (2.5k), class engine (2.6k), object engine (3.8k) (8051 numbers). Together they are ~60% of the ~20k flash for the OS. My 'guestimate' size delta was based on looking at your regex code which, at first blush, looked much more complex than our JSON parser (excluding the binary json part). We parse JSON 'in place' in the buffer it came in on, we don't use a heap (anywhere) and we don't create a 'document', generally jumping straight to methods. Since the radios we use are typically 128 byte max frame size, JSON is typically very constrained. Everything else is managed on the (2k) stack. Having seen your other work, I know you have thought about the problem very carefully. Just saying our mileage is different, mostly because we have bounded the problem in very specific ways that are generic to our (IoT) domain. It is often not possible to do that when attempting to solve for the 'unbounded' problem for 'everyone'. Effectively managing embedded constraints is one of the reasons why embedded resists unbounded solutions and their inevitable inclusion of unnecessary code (for any given specific instance). :)

    H 1 Reply Last reply
    0
    • H honey the codewitch

      I wanted to load some JSON from a couple of web based APIs to get the local time and weather. The trouble is I didn't actually want to use JSON, because that's a nasty dependency. What about regex? There's no regex engine readily available to my IoT widget. Well, traversing a DFA state table that represents a regular expression in C++ is almost trivial. Generating that table is not. But I have a regex engine in C#, and it's capable of generating that table. So I whip up a little C# program to generate a C++ array representing the "DFA table" for a regular expression - basically the opcodes it needs to match the expression. And then some C++ code to traverse it. 3 languages Regex C++ C# To "parse" a fourth, a JSON subset In a really compact way. And I'm not even doing front-end web development - just a REST/JSON client.

      To err is human. Fortune favors the monsters.

      M Offline
      M Offline
      megaadam
      wrote on last edited by
      #8

      I dunno but without knowing all your requirements I would consider the most "simple-dumb" way:

      Search for the json-key including quotes
      Search for ":" is an optional bonus, not strictly needed
      Extract the string between the subsequent pair of double-quotes.
      Wrap in a function

      Of course it fails for corrupt json, but the regex-based state machine would also fail.

      "If we don't change direction, we'll end up where we're going"

      H K 2 Replies Last reply
      0
      • M megaadam

        I dunno but without knowing all your requirements I would consider the most "simple-dumb" way:

        Search for the json-key including quotes
        Search for ":" is an optional bonus, not strictly needed
        Extract the string between the subsequent pair of double-quotes.
        Wrap in a function

        Of course it fails for corrupt json, but the regex-based state machine would also fail.

        "If we don't change direction, we'll end up where we're going"

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #9

        "search for the json key" That's where you hid your complexity behind few words. That's what DFA state machine takes care of. Short of that, I'd need to loop, and then within that loop, I need to fetch each character of the key i'm hunting until i fail, at which point i continue the outer loop. That's what the DFA code does. That's exactly what it does. ETA: All of this was for naught because I found out the Arduino Stream implementation has a find() method. *headdesk*

        To err is human. Fortune favors the monsters.

        M 2 Replies Last reply
        0
        • K klinkenbecker

          Yes, that is way too big :) Specifically, I'm not sure the exact size of the parser, but we routinely order map entries for size and JSON never gets anywhere near the top. Top flash hogs are radio (3k), print engine (2k), events (2.5k), class engine (2.6k), object engine (3.8k) (8051 numbers). Together they are ~60% of the ~20k flash for the OS. My 'guestimate' size delta was based on looking at your regex code which, at first blush, looked much more complex than our JSON parser (excluding the binary json part). We parse JSON 'in place' in the buffer it came in on, we don't use a heap (anywhere) and we don't create a 'document', generally jumping straight to methods. Since the radios we use are typically 128 byte max frame size, JSON is typically very constrained. Everything else is managed on the (2k) stack. Having seen your other work, I know you have thought about the problem very carefully. Just saying our mileage is different, mostly because we have bounded the problem in very specific ways that are generic to our (IoT) domain. It is often not possible to do that when attempting to solve for the 'unbounded' problem for 'everyone'. Effectively managing embedded constraints is one of the reasons why embedded resists unbounded solutions and their inevitable inclusion of unnecessary code (for any given specific instance). :)

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #10

          To be clear, my flash space is all being used by other libraries - not this state machine. I was just posting it to give you an idea of where I'm at in terms of what I've used so far.

          To err is human. Fortune favors the monsters.

          K 1 Reply Last reply
          0
          • H honey the codewitch

            "search for the json key" That's where you hid your complexity behind few words. That's what DFA state machine takes care of. Short of that, I'd need to loop, and then within that loop, I need to fetch each character of the key i'm hunting until i fail, at which point i continue the outer loop. That's what the DFA code does. That's exactly what it does. ETA: All of this was for naught because I found out the Arduino Stream implementation has a find() method. *headdesk*

            To err is human. Fortune favors the monsters.

            M Offline
            M Offline
            megaadam
            wrote on last edited by
            #11

            I assumed you have access to std::string find() Not so much complexity IMO...

            "If we don't change direction, we'll end up where we're going"

            H 1 Reply Last reply
            0
            • H honey the codewitch

              I wanted to load some JSON from a couple of web based APIs to get the local time and weather. The trouble is I didn't actually want to use JSON, because that's a nasty dependency. What about regex? There's no regex engine readily available to my IoT widget. Well, traversing a DFA state table that represents a regular expression in C++ is almost trivial. Generating that table is not. But I have a regex engine in C#, and it's capable of generating that table. So I whip up a little C# program to generate a C++ array representing the "DFA table" for a regular expression - basically the opcodes it needs to match the expression. And then some C++ code to traverse it. 3 languages Regex C++ C# To "parse" a fourth, a JSON subset In a really compact way. And I'm not even doing front-end web development - just a REST/JSON client.

              To err is human. Fortune favors the monsters.

              D Offline
              D Offline
              Daniel Pfeffer
              wrote on last edited by
              #12

              honey the codewitch wrote:

              3 languages Regex C++ C# To "parse" a fourth, a JSON subset

              This brings to mind the children's song about the old lady who swallowed a fly.

              The two last verses are:

              I know an old lady
              Who swallowed a cow.
              I don't know how
              She swallowed a cow
              She swallowed the cow to catch the dog
              What a hog to swallow a dog!
              She swallowed the dog to catch the cat
              Fancy that! To swallow a cat!
              She swallowed the cat to catch the bird
              How absurd, to swallowed a bird!
              She swallowed the bird to catch the spider
              That wriggled and tickled inside her.
              She swallowed the spider to catch the fly.
              I don't know why she swallowed a fly.
              Perhaps she'll die...

              I know an old lady who swallowed a horse!
              She's dead, of course

              .

              Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

              1 Reply Last reply
              0
              • H honey the codewitch

                "search for the json key" That's where you hid your complexity behind few words. That's what DFA state machine takes care of. Short of that, I'd need to loop, and then within that loop, I need to fetch each character of the key i'm hunting until i fail, at which point i continue the outer loop. That's what the DFA code does. That's exactly what it does. ETA: All of this was for naught because I found out the Arduino Stream implementation has a find() method. *headdesk*

                To err is human. Fortune favors the monsters.

                M Offline
                M Offline
                megaadam
                wrote on last edited by
                #13

                And for standard C there is of course always with strstr() that I assume you know

                "If we don't change direction, we'll end up where we're going"

                H 1 Reply Last reply
                0
                • M megaadam

                  And for standard C there is of course always with strstr() that I assume you know

                  "If we don't change direction, we'll end up where we're going"

                  H Offline
                  H Offline
                  honey the codewitch
                  wrote on last edited by
                  #14

                  strstr only works for in memory strings, not streams.

                  To err is human. Fortune favors the monsters.

                  1 Reply Last reply
                  0
                  • M megaadam

                    I assumed you have access to std::string find() Not so much complexity IMO...

                    "If we don't change direction, we'll end up where we're going"

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #15

                    that only works for in memory strings.

                    To err is human. Fortune favors the monsters.

                    1 Reply Last reply
                    0
                    • H honey the codewitch

                      To be clear, my flash space is all being used by other libraries - not this state machine. I was just posting it to give you an idea of where I'm at in terms of what I've used so far.

                      To err is human. Fortune favors the monsters.

                      K Offline
                      K Offline
                      klinkenbecker
                      wrote on last edited by
                      #16

                      It is an interesting approach and, as always, I will be very keen to see how it compares when it's finished.

                      H 1 Reply Last reply
                      0
                      • M megaadam

                        I dunno but without knowing all your requirements I would consider the most "simple-dumb" way:

                        Search for the json-key including quotes
                        Search for ":" is an optional bonus, not strictly needed
                        Extract the string between the subsequent pair of double-quotes.
                        Wrap in a function

                        Of course it fails for corrupt json, but the regex-based state machine would also fail.

                        "If we don't change direction, we'll end up where we're going"

                        K Offline
                        K Offline
                        klinkenbecker
                        wrote on last edited by
                        #17

                        One of the nice things about embedded is that generally, if you have adequate implementation, unit testing and system testing, you can safely assume your input will not be corrupt. I.e. embedded implementations can be fully and specifically bounded and 'gated' in such a way as to avoid input errors - frames can be error checked, etc, etc. Moving data errors into places they can be easily managed is a key piece of making 'engines' more efficient. The exact same paradigm is the way a car is built - or better example - a boat. You would never build an engine for a boat to be able to take water in the fuel. You 'move' the error handling (water in the fuel) to an input qualifying filter. Fuel filters for boats and cars are very different beasts, the engines are (fundamentally) the same.

                        1 Reply Last reply
                        0
                        • K klinkenbecker

                          It is an interesting approach and, as always, I will be very keen to see how it compares when it's finished.

                          H Offline
                          H Offline
                          honey the codewitch
                          wrote on last edited by
                          #18

                          I ditched it altogether! I found out the Arduino Stream class has a find() method which will allow you to find a string within a stream (without having to load it into a string first and use strstr()) So much for all this effort, although I will need to use something like the DFA machine to grab JSON from a weather service. The issue with that is fields can be in any order so I either load the fields into memory, or i use a DFA lexer. I'd rather use the lexer.

                          To err is human. Fortune favors the monsters.

                          K 1 Reply Last reply
                          0
                          • H honey the codewitch

                            I ditched it altogether! I found out the Arduino Stream class has a find() method which will allow you to find a string within a stream (without having to load it into a string first and use strstr()) So much for all this effort, although I will need to use something like the DFA machine to grab JSON from a weather service. The issue with that is fields can be in any order so I either load the fields into memory, or i use a DFA lexer. I'd rather use the lexer.

                            To err is human. Fortune favors the monsters.

                            K Offline
                            K Offline
                            klinkenbecker
                            wrote on last edited by
                            #19

                            I find that coding effort is very rarely, if ever wasted, it goes into a black hole and comes out as ultra-energetic gamma radiation at a some later date. 15 years ago, I wrote a (micro) JS server to run applications via browser on any platform with plug-ins to pull data from misc devices/websites. I got side tracked and just had cause to go back to it. I wrote that in C and it will be much simpler in c# now (and much more x-platform), but it is still a good architectural reference point. Never wasted, the neurons are just better configured for next time...

                            1 Reply Last reply
                            0
                            • H honey the codewitch

                              I wanted to load some JSON from a couple of web based APIs to get the local time and weather. The trouble is I didn't actually want to use JSON, because that's a nasty dependency. What about regex? There's no regex engine readily available to my IoT widget. Well, traversing a DFA state table that represents a regular expression in C++ is almost trivial. Generating that table is not. But I have a regex engine in C#, and it's capable of generating that table. So I whip up a little C# program to generate a C++ array representing the "DFA table" for a regular expression - basically the opcodes it needs to match the expression. And then some C++ code to traverse it. 3 languages Regex C++ C# To "parse" a fourth, a JSON subset In a really compact way. And I'm not even doing front-end web development - just a REST/JSON client.

                              To err is human. Fortune favors the monsters.

                              M Offline
                              M Offline
                              Member 9167057
                              wrote on last edited by
                              #20

                              Why not parsing JSON in C# directly? That's a dependency on .NET's standard runtime library which ain't too bad.

                              H 1 Reply Last reply
                              0
                              • M Member 9167057

                                Why not parsing JSON in C# directly? That's a dependency on .NET's standard runtime library which ain't too bad.

                                H Offline
                                H Offline
                                honey the codewitch
                                wrote on last edited by
                                #21

                                Because first it would mean upgrading the SRAM on my device to something more than 512kB Then it would involve upgrading the processor to something in the GHz range And heck, it would involve adding a PC in there somewhere to actually run .NET. This is not a .NET device[^]

                                To err is human. Fortune favors the monsters.

                                M 1 Reply Last reply
                                0
                                • H honey the codewitch

                                  Because first it would mean upgrading the SRAM on my device to something more than 512kB Then it would involve upgrading the processor to something in the GHz range And heck, it would involve adding a PC in there somewhere to actually run .NET. This is not a .NET device[^]

                                  To err is human. Fortune favors the monsters.

                                  M Offline
                                  M Offline
                                  Member 9167057
                                  wrote on last edited by
                                  #22

                                  Things are getting insteresting! What and how do you compile C# down to? Can I imagine what you're doing to be similar to what the Unity developers are doing (compiling C# to C++ which then gets compiled to native code)?

                                  H 1 Reply Last reply
                                  0
                                  • M Member 9167057

                                    Things are getting insteresting! What and how do you compile C# down to? Can I imagine what you're doing to be similar to what the Unity developers are doing (compiling C# to C++ which then gets compiled to native code)?

                                    H Offline
                                    H Offline
                                    honey the codewitch
                                    wrote on last edited by
                                    #23

                                    I'm just basically using C# to generate an array for my C++ code to traverse. The C# code is a console application. I feed it a regular expression on the command line and it produces a small amount of C++ code to declare an array as its output - for example:

                                    int16_t dfa_table[] = {
                                    -1, 1, 6, 1, 34, 34, -1, 1, 12, 1, 117, 117, -1, 1, 18,
                                    1, 110, 110, -1, 1, 24, 1, 105, 105, -1, 1, 30, 1, 120, 120,
                                    -1, 1, 36, 1, 116, 116, -1, 1, 42, 1, 105, 105, -1, 1, 48,
                                    1, 109, 109, -1, 1, 54, 1, 101, 101, -1, 1, 60, 1, 34, 34,
                                    -1, 1, 66, 1, 58, 58, 0, 0
                                    };

                                    That's a DFA table. What it is is a state machine encoded into an array. I have C++ code that can walk it in order to run the regular expression. The walking code is easy and efficient. Generating the array is not easy. That C# console application uses a regular expression engine I wrote (in C#) in order to generate that C++ array. The code to run the regular expression is simple and is in C++:

                                    bool match(const int16_t* dfa, int16_t(read_cb)(void*), void* cb_state = nullptr) {
                                    int tlen;
                                    int tto;
                                    int prlen;
                                    int pmin;
                                    int pmax;
                                    int i;
                                    int j;
                                    int ch;
                                    int state = 0;
                                    bool done;
                                    bool found = false;
                                    int acc = -1;
                                    ch = read_cb(cb_state);
                                    while (ch != -1) {
                                    acc = -1;
                                    done = false;
                                    while (!done) {
                                    start_dfa:
                                    done = true;
                                    acc = dfa[state++];
                                    tlen = dfa[state++];
                                    for (i = 0; i < tlen; ++i) {
                                    tto = dfa[state++];
                                    prlen = dfa[state++];
                                    for (j = 0; j < prlen; ++j) {
                                    pmin = dfa[state++];
                                    pmax = dfa[state++];
                                    if (ch < pmin) break;
                                    if (ch <= pmax) {
                                    found = true;
                                    ch = read_cb(cb_state);
                                    state = tto;
                                    done = false;

                                                        goto start\_dfa;
                                                    }
                                                }
                                            }
                                        }
                                        if (acc != -1) {
                                            return found;
                                        }
                                        ch = read\_cb(cb\_state);
                                        state = 0;
                                    }
                                    return false;
                                    

                                    }

                                    To err is human. Fortune favors the monsters.

                                    M 1 Reply Last reply
                                    0
                                    • H honey the codewitch

                                      I'm just basically using C# to generate an array for my C++ code to traverse. The C# code is a console application. I feed it a regular expression on the command line and it produces a small amount of C++ code to declare an array as its output - for example:

                                      int16_t dfa_table[] = {
                                      -1, 1, 6, 1, 34, 34, -1, 1, 12, 1, 117, 117, -1, 1, 18,
                                      1, 110, 110, -1, 1, 24, 1, 105, 105, -1, 1, 30, 1, 120, 120,
                                      -1, 1, 36, 1, 116, 116, -1, 1, 42, 1, 105, 105, -1, 1, 48,
                                      1, 109, 109, -1, 1, 54, 1, 101, 101, -1, 1, 60, 1, 34, 34,
                                      -1, 1, 66, 1, 58, 58, 0, 0
                                      };

                                      That's a DFA table. What it is is a state machine encoded into an array. I have C++ code that can walk it in order to run the regular expression. The walking code is easy and efficient. Generating the array is not easy. That C# console application uses a regular expression engine I wrote (in C#) in order to generate that C++ array. The code to run the regular expression is simple and is in C++:

                                      bool match(const int16_t* dfa, int16_t(read_cb)(void*), void* cb_state = nullptr) {
                                      int tlen;
                                      int tto;
                                      int prlen;
                                      int pmin;
                                      int pmax;
                                      int i;
                                      int j;
                                      int ch;
                                      int state = 0;
                                      bool done;
                                      bool found = false;
                                      int acc = -1;
                                      ch = read_cb(cb_state);
                                      while (ch != -1) {
                                      acc = -1;
                                      done = false;
                                      while (!done) {
                                      start_dfa:
                                      done = true;
                                      acc = dfa[state++];
                                      tlen = dfa[state++];
                                      for (i = 0; i < tlen; ++i) {
                                      tto = dfa[state++];
                                      prlen = dfa[state++];
                                      for (j = 0; j < prlen; ++j) {
                                      pmin = dfa[state++];
                                      pmax = dfa[state++];
                                      if (ch < pmin) break;
                                      if (ch <= pmax) {
                                      found = true;
                                      ch = read_cb(cb_state);
                                      state = tto;
                                      done = false;

                                                          goto start\_dfa;
                                                      }
                                                  }
                                              }
                                          }
                                          if (acc != -1) {
                                              return found;
                                          }
                                          ch = read\_cb(cb\_state);
                                          state = 0;
                                      }
                                      return false;
                                      

                                      }

                                      To err is human. Fortune favors the monsters.

                                      M Offline
                                      M Offline
                                      Member 9167057
                                      wrote on last edited by
                                      #24

                                      Ah, I get it. Thank you for the thorough explanation :)

                                      H 1 Reply Last reply
                                      0
                                      • M Member 9167057

                                        Ah, I get it. Thank you for the thorough explanation :)

                                        H Offline
                                        H Offline
                                        honey the codewitch
                                        wrote on last edited by
                                        #25

                                        No problem. I don't target .NET these days, as I spend most of time with tiny little gadgets. To me .NET is a means to an end. If I can use it to offload some of the heavy lifting my code would otherwise have to do I'll do that, of course. Otherwise I haven't really used it recently. I do have a lot of code I've written in C#, and used to use it professionally - sometimes I still will every once in awhile, but it's not my bread and butter anymore. The regular expression thing is a great example of being able to use it to offload work though - where an actual regular expression engine on the device would take up precious RAM and flash space, I was able to "outsource it" to an external C# app I only need to run once.

                                        To err is human. Fortune favors the monsters.

                                        M 1 Reply Last reply
                                        0
                                        • H honey the codewitch

                                          No problem. I don't target .NET these days, as I spend most of time with tiny little gadgets. To me .NET is a means to an end. If I can use it to offload some of the heavy lifting my code would otherwise have to do I'll do that, of course. Otherwise I haven't really used it recently. I do have a lot of code I've written in C#, and used to use it professionally - sometimes I still will every once in awhile, but it's not my bread and butter anymore. The regular expression thing is a great example of being able to use it to offload work though - where an actual regular expression engine on the device would take up precious RAM and flash space, I was able to "outsource it" to an external C# app I only need to run once.

                                          To err is human. Fortune favors the monsters.

                                          M Offline
                                          M Offline
                                          Member 9167057
                                          wrote on last edited by
                                          #26

                                          I'd go as far as to claim for everything to be means to and end. Programming something embedded in C++, it still is means to an end :p

                                          H 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups