Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Who here knows what a pull parser or pull parsing is?

Who here knows what a pull parser or pull parsing is?

Scheduled Pinned Locked Moved The Lounge
csharpxmljsonquestion
23 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

    Real programmers use butterflies

    R OriginalGriffO L K J 7 Replies Last reply
    0
    • H honey the codewitch

      It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

      Real programmers use butterflies

      R Offline
      R Offline
      RickZeeland
      wrote on last edited by
      #2

      Never heard of it :-\

      H 1 Reply Last reply
      0
      • H honey the codewitch

        It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

        Real programmers use butterflies

        OriginalGriffO Offline
        OriginalGriffO Offline
        OriginalGriff
        wrote on last edited by
        #3

        Is it a pushmi-pullyu[^] that eats parsnips?

        "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!

        "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
        "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

        1 Reply Last reply
        0
        • H honey the codewitch

          It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

          Real programmers use butterflies

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          Hmmm, I don't think the terminology you are using is as universal as you think it is. Outside of XML/JSON parsing and maybe compiler construction nobody uses that label for such a simple algorithm. It looks like Stefan Haustein came up with that name when he was writing kXML. Then it seems Aleksander Slominski wrote a paper using the same nomenclature in 1998 and it's been growing ever since. I can't find any reference to 'Pull parsing' before 1998. Looks like 100% of the patents than mention 'Pull Parsing' are XML related[^]. I'm going to rename it Pull-My-Finger parsing and see if it catches on. :-D

          H 1 Reply Last reply
          0
          • L Lost User

            Hmmm, I don't think the terminology you are using is as universal as you think it is. Outside of XML/JSON parsing and maybe compiler construction nobody uses that label for such a simple algorithm. It looks like Stefan Haustein came up with that name when he was writing kXML. Then it seems Aleksander Slominski wrote a paper using the same nomenclature in 1998 and it's been growing ever since. I can't find any reference to 'Pull parsing' before 1998. Looks like 100% of the patents than mention 'Pull Parsing' are XML related[^]. I'm going to rename it Pull-My-Finger parsing and see if it catches on. :-D

            H Offline
            H Offline
            honey the codewitch
            wrote on last edited by
            #5

            I don't how universal it is - that's what i'm trying to figure out. I just don't know that there's another term for the model. I've implemented pull parsers for all kinds of sources. Recently, I implemented a crazy efficient querying pull parser that can process bulk JSON even on an 8-bit arduino with 8kb ram (it actually needs a lot less ram than that in practice), and blazes on a real computer Diet JSON and a Coke: An exploration of incredibly efficient JSON processing[^]

            Real programmers use butterflies

            L 1 Reply Last reply
            0
            • R RickZeeland

              Never heard of it :-\

              H Offline
              H Offline
              honey the codewitch
              wrote on last edited by
              #6

              Thanks. I'm just trying to determine if it might be worth an article of its own since I've implemented so many of them. They basically work like this: (example for JSON)

              // open the file
              if (!fileLC.open("./data.json")) {
              printf("Json file not found\r\n");
              return;
              }
              JsonReader jsonReader(fileLC);
              long long int nodes = 0; // we don't count the initial node
              milliseconds start = duration_cast< milliseconds >(system_clock::now().time_since_epoch());
              // pull parsers return portions of the parse which you retrieve
              // by calling their parse/read method in a loop.
              bool done = false;
              while (!done && jsonReader.read())
              {
              ++nodes;
              // what kind of JSON element are we on?
              switch (jsonReader.nodeType())
              {
              case JsonReader::Value: // we're on a scalar value
              printf("Value ");
              switch (jsonReader.valueType())
              { // what type of value?
              case JsonReader::String: // a string!
              printf("String: ");
              printf("%s\r\n", jsonReader.value()); // print it
              break;
              case JsonReader::Real: // a number!
              printf("Real: %f\r\n", jsonReader.realValue()); // print it
              break;
              case JsonReader::Integer: // a number!
              printf("Integer: %lli\r\n", jsonReader.integerValue()); // print it
              break;
              case JsonReader::Boolean: // a boolean!
              printf("Boolean: %s\r\n", jsonReader.booleanValue() ? "true" : "false");
              break;
              case JsonReader::Null: // a null!
              printf("Null: (null)\r\n");
              break;
              default:
              printf("Undefined!\r\n");
              break;
              }
              break;
              case JsonReader::Field: // this is a field
              printf("Field %s\r\n", jsonReader.value());
              break;
              case JsonReader::Object: // an object start {
              printf("Object (Start)\r\n");
              break;
              case JsonReader::EndObject: // an object end }
              printf("Object (End)\r\n");
              break;
              case JsonReader::Array: // an array start [
              printf("Array (Start)\r\n");
              break;
              case JsonReader::EndArray: // an array end ]
              printf("Array (End)\r\n");
              break;
              case JsonReader::Error: // a bad thing
              // maybe we ran out of memory, or the document was poorly formed
              printf("Error: (%d) %s\r\n", jsonReader.lastError(), jsonReader.value());

              1 Reply Last reply
              0
              • H honey the codewitch

                I don't how universal it is - that's what i'm trying to figure out. I just don't know that there's another term for the model. I've implemented pull parsers for all kinds of sources. Recently, I implemented a crazy efficient querying pull parser that can process bulk JSON even on an 8-bit arduino with 8kb ram (it actually needs a lot less ram than that in practice), and blazes on a real computer Diet JSON and a Coke: An exploration of incredibly efficient JSON processing[^]

                Real programmers use butterflies

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                honey the codewitch wrote:

                I don't how universal it is - that's what i'm trying to figure out. I just don't know that there's another term for the model.

                It doesn't really matter... occupational nomenclature is invented. If you write an article on 'Pull Parsers' then perhaps 1,000,000 more people will use that name. That's sorta how it works.

                honey the codewitch wrote:

                I've implemented pull parsers for all kinds of sources.

                I can see that you enjoy parsing. And I enjoy reading your journey. Best Wishes, -David Delaune

                1 Reply Last reply
                0
                • H honey the codewitch

                  It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

                  Real programmers use butterflies

                  K Offline
                  K Offline
                  Keith Barrow
                  wrote on last edited by
                  #8

                  I'm pretty much with Randor on you should use it, people should have the wherewithal to look it up, or if it's an article aimed a beginners a brief description (+ more in depth links) of the terms might be appropriate. It's a technical article, so technical language is fine, and you'll help spread the terms. To answer your direct question - no I haven't heard Push/Pull parser but mostly worked it out from the context.

                  KeithBarrow.net[^] - It might not be very good, but at least it is free!

                  H 1 Reply Last reply
                  0
                  • H honey the codewitch

                    It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

                    Real programmers use butterflies

                    J Offline
                    J Offline
                    Jorgen Andersson
                    wrote on last edited by
                    #9

                    I've learned quite a lot from your musings in the lounge, but I've only skimmed through your technical articles on parsing, as they are way to specific for my needs. Which means my knowledge on parsing is still fairly superficial, so any reasonably easy to read breakdown on the principles (not just push vs pull) would be appreciated.

                    Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                    H 1 Reply Last reply
                    0
                    • J Jorgen Andersson

                      I've learned quite a lot from your musings in the lounge, but I've only skimmed through your technical articles on parsing, as they are way to specific for my needs. Which means my knowledge on parsing is still fairly superficial, so any reasonably easy to read breakdown on the principles (not just push vs pull) would be appreciated.

                      Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                      H Offline
                      H Offline
                      honey the codewitch
                      wrote on last edited by
                      #10

                      Thanks. Here's a comment i just posted to RickZeeland which should hopefully serve as a quick explanation. I included code in it just so you could see it in all it's ugliness. :) The Lounge - Pull Parsing[^]

                      Real programmers use butterflies

                      1 Reply Last reply
                      0
                      • K Keith Barrow

                        I'm pretty much with Randor on you should use it, people should have the wherewithal to look it up, or if it's an article aimed a beginners a brief description (+ more in depth links) of the terms might be appropriate. It's a technical article, so technical language is fine, and you'll help spread the terms. To answer your direct question - no I haven't heard Push/Pull parser but mostly worked it out from the context.

                        KeithBarrow.net[^] - It might not be very good, but at least it is free!

                        H Offline
                        H Offline
                        honey the codewitch
                        wrote on last edited by
                        #11

                        Thank you. That's helpful.

                        Real programmers use butterflies

                        1 Reply Last reply
                        0
                        • H honey the codewitch

                          It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

                          Real programmers use butterflies

                          P Offline
                          P Offline
                          PIEBALDconsult
                          wrote on last edited by
                          #12

                          Write a Wikipedia article. That'll make it true. I can't imagine any kind of reader/parser which doesn't tokenize by pulling.

                          H 1 Reply Last reply
                          0
                          • P PIEBALDconsult

                            Write a Wikipedia article. That'll make it true. I can't imagine any kind of reader/parser which doesn't tokenize by pulling.

                            H Offline
                            H Offline
                            honey the codewitch
                            wrote on last edited by
                            #13

                            That's not exactly what a pull parser is. A pull parser parses one small step at a time before returning control to the caller.

                            while(reader.read()) {...}

                            You call it like that, and inside the loop you check the nodeType() and the value() and such to get information about the node at the current location. Microsoft built one for XML in .NET call the XmlReader - you've probably used a derivative of it before, if not directly, then indirectly by way of another XML facility like XPath or the DOM NewtonSoft has one for JSON but I don't like it, personally.

                            Real programmers use butterflies

                            P 1 Reply Last reply
                            0
                            • H honey the codewitch

                              That's not exactly what a pull parser is. A pull parser parses one small step at a time before returning control to the caller.

                              while(reader.read()) {...}

                              You call it like that, and inside the loop you check the nodeType() and the value() and such to get information about the node at the current location. Microsoft built one for XML in .NET call the XmlReader - you've probably used a derivative of it before, if not directly, then indirectly by way of another XML facility like XPath or the DOM NewtonSoft has one for JSON but I don't like it, personally.

                              Real programmers use butterflies

                              P Offline
                              P Offline
                              PIEBALDconsult
                              wrote on last edited by
                              #14

                              Yeah, I do that, but it's at a higher level. So -- for instance -- when my loader finds an array of Widgets, it iterates all the Widgets in that array, loading each into the database.

                              H 1 Reply Last reply
                              0
                              • P PIEBALDconsult

                                Yeah, I do that, but it's at a higher level. So -- for instance -- when my loader finds an array of Widgets, it iterates all the Widgets in that array, loading each into the database.

                                H Offline
                                H Offline
                                honey the codewitch
                                wrote on last edited by
                                #15

                                Yeah, I build that kind of stuff on top of the pull parser. In my Diet JSON and a Coke article I go into that - constructing queries out of navigation and data extraction elements. You basically build queries and then feed those to the reader, and it drives the reader for you (in fact, it's more efficient than reading by calling read() yourself)

                                Real programmers use butterflies

                                P 1 Reply Last reply
                                0
                                • H honey the codewitch

                                  Yeah, I build that kind of stuff on top of the pull parser. In my Diet JSON and a Coke article I go into that - constructing queries out of navigation and data extraction elements. You basically build queries and then feed those to the reader, and it drives the reader for you (in fact, it's more efficient than reading by calling read() yourself)

                                  Real programmers use butterflies

                                  P Offline
                                  P Offline
                                  PIEBALDconsult
                                  wrote on last edited by
                                  #16

                                  I don't query or search, I simply iterate tokens until I reach the start of an array of objects I'm interested in. Then I iterate those objects. That way, I read each file only once. For the most part, each of the files I'm reading is just one array of objects and I load the whole thing into one database table. Only the most recent files I'm working with contain multiple arrays containing different types of objects -- and each type of object gets thrown at a different database table.

                                  H 1 Reply Last reply
                                  0
                                  • P PIEBALDconsult

                                    I don't query or search, I simply iterate tokens until I reach the start of an array of objects I'm interested in. Then I iterate those objects. That way, I read each file only once. For the most part, each of the files I'm reading is just one array of objects and I load the whole thing into one database table. Only the most recent files I'm working with contain multiple arrays containing different types of objects -- and each type of object gets thrown at a different database table.

                                    H Offline
                                    H Offline
                                    honey the codewitch
                                    wrote on last edited by
                                    #17

                                    I made my parser with selective bulk loading of machine generated JSON in mind, which means when you search it does partial parsing and no normalization, allowing it to find what you're after FAST at the expense of some of the well formedness checking (but like i said, geared for machine generated dumps) Not that it matters in a .NET environment, but my parser also will not use memory to hold anything you didn't explicitly request which means you need bytes to scan the file, and then store your results. I often do queries with about 256 bytes of RAM to work with. It doesn't even compare field names or undecorate strings in memory - it does it right off the input source (usually a disk, a socket or a string) My latest codebase i'm working on will even allow you to stream value elements (field values and array members) so you can read massive BLOB values in the document. Gigabytes.

                                    Real programmers use butterflies

                                    P 1 Reply Last reply
                                    0
                                    • H honey the codewitch

                                      I made my parser with selective bulk loading of machine generated JSON in mind, which means when you search it does partial parsing and no normalization, allowing it to find what you're after FAST at the expense of some of the well formedness checking (but like i said, geared for machine generated dumps) Not that it matters in a .NET environment, but my parser also will not use memory to hold anything you didn't explicitly request which means you need bytes to scan the file, and then store your results. I often do queries with about 256 bytes of RAM to work with. It doesn't even compare field names or undecorate strings in memory - it does it right off the input source (usually a disk, a socket or a string) My latest codebase i'm working on will even allow you to stream value elements (field values and array members) so you can read massive BLOB values in the document. Gigabytes.

                                      Real programmers use butterflies

                                      P Offline
                                      P Offline
                                      PIEBALDconsult
                                      wrote on last edited by
                                      #18

                                      honey the codewitch wrote:

                                      selective bulk loading

                                      Yup.

                                      honey the codewitch wrote:

                                      machine generated JSON

                                      Yup.

                                      honey the codewitch wrote:

                                      partial parsing

                                      Supported.

                                      honey the codewitch wrote:

                                      no normalization

                                      That's up to a higher level to determine.

                                      honey the codewitch wrote:

                                      at the expense of some of the well formedness checking

                                      Basically none.

                                      honey the codewitch wrote:

                                      It doesn't even compare field names

                                      Why would it? That's up to a higher level to determine.

                                      honey the codewitch wrote:

                                      undecorate strings in memory

                                      Unquote? Unescape? I do that as late as possible, not until I know I want the value. Bear in mind also that the underlying reader/tokenizer (?) is not used only for JSON, but for CSV as well.

                                      _____________________________________

                                      Loader
                                      ___________________________________
                                      JSONenumerator CSVenumerator
                                      ________________ _______________
                                      JSONtokenizer CSVtokenizer Unquoting and unescaping happen here, as appropriate
                                      ________________ __ _______________
                                      STREAMtokenizer (base)
                                      TextReader
                                      ===================================
                                      H 1 Reply Last reply
                                      0
                                      • P PIEBALDconsult

                                        honey the codewitch wrote:

                                        selective bulk loading

                                        Yup.

                                        honey the codewitch wrote:

                                        machine generated JSON

                                        Yup.

                                        honey the codewitch wrote:

                                        partial parsing

                                        Supported.

                                        honey the codewitch wrote:

                                        no normalization

                                        That's up to a higher level to determine.

                                        honey the codewitch wrote:

                                        at the expense of some of the well formedness checking

                                        Basically none.

                                        honey the codewitch wrote:

                                        It doesn't even compare field names

                                        Why would it? That's up to a higher level to determine.

                                        honey the codewitch wrote:

                                        undecorate strings in memory

                                        Unquote? Unescape? I do that as late as possible, not until I know I want the value. Bear in mind also that the underlying reader/tokenizer (?) is not used only for JSON, but for CSV as well.

                                        _____________________________________

                                        Loader
                                        ___________________________________
                                        JSONenumerator CSVenumerator
                                        ________________ _______________
                                        JSONtokenizer CSVtokenizer Unquoting and unescaping happen here, as appropriate
                                        ________________ __ _______________
                                        STREAMtokenizer (base)
                                        TextReader
                                        ===================================
                                        H Offline
                                        H Offline
                                        honey the codewitch
                                        wrote on last edited by
                                        #19

                                        Everything you're talking about, because of your abstraction I can tell you you're loading strings into memory and operating on them in memory. Because of your higher level determining these things it's only operating on the strings after the fact. I am not. Now, for .NET that doesn't matter. For an 8kB arduino it does. Point is, our parsers are fundamentally different in that respect. Also when you said normalization is for a higher level to determine you misunderstand me. I parse no numbers, no strings, nothing, unless you actually request it. That's what I mean by no normalization. Based on what you're telling me of your architecture you are normalizing unconditionally at the parser level i suspect - am almost certain. I do not parse every field or value i encounter. I skip over most of them. they never get turned into anything in value space. literally most of the time I'm advancing like this:

                                        while(m_source.currentChar()!='{some context sensitive stopping point value}') { m_source.advance(); /* moves one char */}

                                        Real programmers use butterflies

                                        P 1 Reply Last reply
                                        0
                                        • H honey the codewitch

                                          Everything you're talking about, because of your abstraction I can tell you you're loading strings into memory and operating on them in memory. Because of your higher level determining these things it's only operating on the strings after the fact. I am not. Now, for .NET that doesn't matter. For an 8kB arduino it does. Point is, our parsers are fundamentally different in that respect. Also when you said normalization is for a higher level to determine you misunderstand me. I parse no numbers, no strings, nothing, unless you actually request it. That's what I mean by no normalization. Based on what you're telling me of your architecture you are normalizing unconditionally at the parser level i suspect - am almost certain. I do not parse every field or value i encounter. I skip over most of them. they never get turned into anything in value space. literally most of the time I'm advancing like this:

                                          while(m_source.currentChar()!='{some context sensitive stopping point value}') { m_source.advance(); /* moves one char */}

                                          Real programmers use butterflies

                                          P Offline
                                          P Offline
                                          PIEBALDconsult
                                          wrote on last edited by
                                          #20

                                          honey the codewitch wrote:

                                          our parsers are fundamentally different

                                          Yes. I suppose the biggest conceptual difference between ours is that I needed to write a fairly general loader utility which could read a "script" and perform the tasks, not write several purpose-built utilities -- one for each file to be loaded. The ability to have it support CSV (and XML) as well as JSON was an afterthought.

                                          honey the codewitch wrote:

                                          I parse no numbers, no strings, nothing, unless you actually request it.

                                          Well, mine too. It does have to tokenize so it knows when it finds something you want it to parse, but nothing more than that until it finds a requested array. If the script being run says, "if you find the start of an array named 'Widgets', then do this with it", then the parser has to know "I just found an array named 'Widgets'".

                                          honey the codewitch wrote:

                                          you are normalizing unconditionally at the parser level

                                          Well, I suppose so, insofar as I make values (or names) out of every token, but at that point they're just strings -- name/value pairs with a type -- they're not parsed. I throw only those strings which we want at the SQL Server and it handles any conversions to numeric or other types, the loader has no say in that. The loader has no say in data normalization either, it's just passing values as SQL parameters. Again, I want nearly every value in the file to go to the database, so of course I wind up with every value and throw them all at SQL Server. It may be a misunderstanding of terms, but in my opinion, no actual "parsing" is done until the (string) values arrive at SQL Server -- that's where the determinations of which name/value pairs go where, what SQL datatype they should be, etc. happens. The loader utility has no knowledge of any of that.

                                          H 2 Replies Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups