Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. It makes me wonder why I've never seen it before

It makes me wonder why I've never seen it before

Scheduled Pinned Locked Moved The Lounge
xmlhelphtmldatabaseiot
15 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H honey the codewitch

    On IoT you don't have a lot of luxuries. You simply learn to do without. Well, one area where I have to do without is HTML and XML well formedness checking and validation. That might be an issue where data interchange is concerned, but not so much where rendering HTML or XHTML content is concerned. What do you do on an error? You fail. You can either stop, or continue to render, possibly having some bad content displayed as a result, but this is still a better case than failing outright halfway through the parse because the document forgot a </b>. In fact, this is what commercial browsers do. Here's the thing. If this is what you're doing, you don't need a DTD. You don't need an XSD Schema. You don't even need a heckin stack! The result is much faster and lighter with a smaller binary footprint. So why haven't I seen a pull reader with minimal validation/well formedness checking in the open source pool? You'd think such a beast would be incredibly useful for building web browsers - even tiny ones - especially tiny ones! *cracks knuckles* I shouldn't have to be writing this. It's one of those things that leaves me wondering why it doesn't exist already.

    Real programmers use butterflies

    R Offline
    R Offline
    Rage
    wrote on last edited by
    #3

    Why ? Because it is probably more complicated than it seems.

    Do not escape reality : improve reality !

    M H 2 Replies Last reply
    0
    • R Rage

      Why ? Because it is probably more complicated than it seems.

      Do not escape reality : improve reality !

      M Offline
      M Offline
      MarkTJohnson
      wrote on last edited by
      #4

      Remember who you are talking to. 90% of the stuff the witch does is over my head.

      I’ve given up trying to be calm. However, I am open to feeling slightly less agitated.

      R 1 Reply Last reply
      0
      • M MarkTJohnson

        Remember who you are talking to. 90% of the stuff the witch does is over my head.

        I’ve given up trying to be calm. However, I am open to feeling slightly less agitated.

        R Offline
        R Offline
        Rage
        wrote on last edited by
        #5

        edit/ Nevermind - wrong thread. Yes, you are probably very right :-)

        Do not escape reality : improve reality !

        1 Reply Last reply
        0
        • R Rage

          Why ? Because it is probably more complicated than it seems.

          Do not escape reality : improve reality !

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #6

          It's not really. I'm almost done with it.

          Real programmers use butterflies

          R 1 Reply Last reply
          0
          • H honey the codewitch

            It's not really. I'm almost done with it.

            Real programmers use butterflies

            R Offline
            R Offline
            Rage
            wrote on last edited by
            #7

            Yes, I was merely talking about normal mortals like us, not wizards :)

            Do not escape reality : improve reality !

            H 1 Reply Last reply
            0
            • R Rage

              Yes, I was merely talking about normal mortals like us, not wizards :)

              Do not escape reality : improve reality !

              H Offline
              H Offline
              honey the codewitch
              wrote on last edited by
              #8

              That'd be a witch. Wizards are a different thing altogether. :-D

              Real programmers use butterflies

              1 Reply Last reply
              0
              • H honey the codewitch

                On IoT you don't have a lot of luxuries. You simply learn to do without. Well, one area where I have to do without is HTML and XML well formedness checking and validation. That might be an issue where data interchange is concerned, but not so much where rendering HTML or XHTML content is concerned. What do you do on an error? You fail. You can either stop, or continue to render, possibly having some bad content displayed as a result, but this is still a better case than failing outright halfway through the parse because the document forgot a </b>. In fact, this is what commercial browsers do. Here's the thing. If this is what you're doing, you don't need a DTD. You don't need an XSD Schema. You don't even need a heckin stack! The result is much faster and lighter with a smaller binary footprint. So why haven't I seen a pull reader with minimal validation/well formedness checking in the open source pool? You'd think such a beast would be incredibly useful for building web browsers - even tiny ones - especially tiny ones! *cracks knuckles* I shouldn't have to be writing this. It's one of those things that leaves me wondering why it doesn't exist already.

                Real programmers use butterflies

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #9

                I preprocess / strip out all the HTML and insert my own markup directives. I don't need a "paragraph" keyword to tell me where a paragraph should start or end; etc. But as you say, it's assumed to be valid HTML in the first place (or made to be).

                It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                H 1 Reply Last reply
                0
                • L Lost User

                  I preprocess / strip out all the HTML and insert my own markup directives. I don't need a "paragraph" keyword to tell me where a paragraph should start or end; etc. But as you say, it's assumed to be valid HTML in the first place (or made to be).

                  It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                  H Offline
                  H Offline
                  honey the codewitch
                  wrote on last edited by
                  #10

                  As far as preprocessing I don't have the memory or nvs storage to do that on my device, and it wouldn't really buy me anything much even if i did. Much better in my scenario at least, to just read with a pull parser in a loop, get tag names back and set values on a context structure i use while rendering. The context structure has the current position and flags like "bold" or "italic", styles and font faces, that sort of thing. I mean, if I knew at compile time what my HTML was going to be I'd just generate C++ code from it that renders it, and that sort of "preprocessing" would be a huge win, but in my scenario I have to read arbitrary HTML.

                  Real programmers use butterflies

                  1 Reply Last reply
                  0
                  • H honey the codewitch

                    On IoT you don't have a lot of luxuries. You simply learn to do without. Well, one area where I have to do without is HTML and XML well formedness checking and validation. That might be an issue where data interchange is concerned, but not so much where rendering HTML or XHTML content is concerned. What do you do on an error? You fail. You can either stop, or continue to render, possibly having some bad content displayed as a result, but this is still a better case than failing outright halfway through the parse because the document forgot a </b>. In fact, this is what commercial browsers do. Here's the thing. If this is what you're doing, you don't need a DTD. You don't need an XSD Schema. You don't even need a heckin stack! The result is much faster and lighter with a smaller binary footprint. So why haven't I seen a pull reader with minimal validation/well formedness checking in the open source pool? You'd think such a beast would be incredibly useful for building web browsers - even tiny ones - especially tiny ones! *cracks knuckles* I shouldn't have to be writing this. It's one of those things that leaves me wondering why it doesn't exist already.

                    Real programmers use butterflies

                    enhzflepE Offline
                    enhzflepE Offline
                    enhzflep
                    wrote on last edited by
                    #11

                    honey the codewitch wrote:

                    *cracks knuckles*

                    Ye-aaaaah! Rock on Witchy Poo! Looking forward to the article(s) :thumbsup:

                    H 1 Reply Last reply
                    0
                    • enhzflepE enhzflep

                      honey the codewitch wrote:

                      *cracks knuckles*

                      Ye-aaaaah! Rock on Witchy Poo! Looking forward to the article(s) :thumbsup:

                      H Offline
                      H Offline
                      honey the codewitch
                      wrote on last edited by
                      #12

                      Thanks for the vote of confidence. I'm getting there but it's a bit of a bear. For starters, everything has to stream, because you don't have a ton of RAM. 1kB is a big deal, so i let you specify as little as 128 bytes for a buffer. I can't stream attribute and element names, but I can stream attribute values and element content, N bytes at a time (depending on what you had set N to) So if you have a long attribute value while you're doing while(reader.read()) { you'll get multiple reader.node_type()==ml_node_type::attribute_content results back before getting reader.node_type()==ml_node_type::attribute_end Not only are there a zillion html entities like © (©) but I had to make a state machine to decode all of them efficiently off a unicode stream. Also this:

                      <span class="foo">this is valid
                      <span class='foo'>and this
                      <input disabled>
                      <div class=you_thought_this_would_be easy id=but_no_because_html_hates_you_this_is_also_valid>

                      So this is kinda rough sometimes, but I'm making progress. Fortunately I don't have to care about custom entity references, namespace declarations, or even well formedness (balanced tags, etc) which makes some of it pretty easy.

                      Real programmers use butterflies

                      enhzflepE 1 Reply Last reply
                      0
                      • H honey the codewitch

                        Thanks for the vote of confidence. I'm getting there but it's a bit of a bear. For starters, everything has to stream, because you don't have a ton of RAM. 1kB is a big deal, so i let you specify as little as 128 bytes for a buffer. I can't stream attribute and element names, but I can stream attribute values and element content, N bytes at a time (depending on what you had set N to) So if you have a long attribute value while you're doing while(reader.read()) { you'll get multiple reader.node_type()==ml_node_type::attribute_content results back before getting reader.node_type()==ml_node_type::attribute_end Not only are there a zillion html entities like © (©) but I had to make a state machine to decode all of them efficiently off a unicode stream. Also this:

                        <span class="foo">this is valid
                        <span class='foo'>and this
                        <input disabled>
                        <div class=you_thought_this_would_be easy id=but_no_because_html_hates_you_this_is_also_valid>

                        So this is kinda rough sometimes, but I'm making progress. Fortunately I don't have to care about custom entity references, namespace declarations, or even well formedness (balanced tags, etc) which makes some of it pretty easy.

                        Real programmers use butterflies

                        enhzflepE Offline
                        enhzflepE Offline
                        enhzflep
                        wrote on last edited by
                        #13

                        You bugger. The bit of my brain I need to use to make an intelligent response is entirely consumed with the act of killing myself laughing at your code-snippet. Fan-friggen-tastic :laugh: :laugh: :thumbsup:

                        H 1 Reply Last reply
                        0
                        • enhzflepE enhzflep

                          You bugger. The bit of my brain I need to use to make an intelligent response is entirely consumed with the act of killing myself laughing at your code-snippet. Fan-friggen-tastic :laugh: :laugh: :thumbsup:

                          H Offline
                          H Offline
                          honey the codewitch
                          wrote on last edited by
                          #14

                          I've got it running. I even wrote most of the article, but I should probably test it more.

                          Real programmers use butterflies

                          enhzflepE 1 Reply Last reply
                          0
                          • H honey the codewitch

                            I've got it running. I even wrote most of the article, but I should probably test it more.

                            Real programmers use butterflies

                            enhzflepE Offline
                            enhzflepE Offline
                            enhzflep
                            wrote on last edited by
                            #15

                            Nice work.. :-D

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups