Woohoo, peephole parsing big content
-
I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.
The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to
read()
Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.
The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to
read()
Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I would be interested in learning of the solution technique.
-
I am rewriting my entire SVG parsing to be able to peephole parse the entire thing using a 64 byte capture buffer. (Or more, but 64 bytes is the minimum) This creates an interesting problem when it comes to really long attributes like the "d" attribute on the "path" element in SVG.
The trick is the peephole parser returns about 64 bytes of that "d" attribute's value at a time. To read the entire "d" attribute will typically require multiple calls to
read()
Well, I did it. With judicious use of state machines I can parse a float, skip whitespace, and parse path commands even they land partly across the 64 byte capture boundary. Previously in my old code, I would gather all of the capture into one big string buffer and parse that. This new approach wasn't easy code, but the result is very memory efficient, and robust in that it can handle content of any length with a constant (and very small) amount of memory. Bless state machines. I have 7 states in my float parser alone. I feel like I had my Wheaties this morning. Hooah!Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I agree. I’ve done some amazingly complex code on extremely resource limited microprocessors with state machines. They are compact and very fast.
"Mistakes are prevented by Experience. Experience is gained by making mistakes."
-
I would be interested in learning of the solution technique.
Here's my float routine. It uses my ml_reader markup peephole parser Basically I keep a running cursor over the current buffer (**current) as well as the rdr for when I need to fetch the next string. The rest is just state machine stuff.
result_t parse_float(ml_reader_base& rdr, const char** current, float* result) {
char* end = NULL;
double res = 0.0, sign = 1.0;
long long intPart = 0, fracPart = 0;
int fracCount = 0;
long expPart = 0;
char expNeg = 0;
char hasIntPart = 0, hasFracPart = 0, hasExpPart = 0;
int state = 0;
// Parse optional sign
if (**current == '+') {
(*current)++;
} else if (**current == '-') {
sign = -1;
(*current)++;
}while (state<7) { if (\*\*current) { switch (state) { case 0: // int part if (!isdigit(\*\*current)) { state = 1; break; } hasIntPart=1; intPart = (intPart\*10)+(\*\*current-'0'); ++(\*current); break; case 1: \*result = (float)intPart; if(\*\*current!='.') { state = 3; break; } ++(\*current); state = 2; break; case 2: // frac part if (!isdigit(\*\*current)) { state = 3; break; } ++fracCount; hasFracPart=1; fracPart = (fracPart\*10)+(\*\*current-'0'); ++(\*current); break; case 3: if(hasFracPart) { \*result += (double)fracPart/pow(10.0,(double)fracCount); } if(\*\*current=='E' || \*\*current=='e') { ++(\*current); state = 4; } else { state = 6; } break; case 4: if(\*\*current=='+') { ++(\*current); } if(\*\*current=='-') { expNeg = 1; ++(\*current); }