Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Who here knows what a pull parser or pull parsing is?

Who here knows what a pull parser or pull parsing is?

Scheduled Pinned Locked Moved The Lounge
csharpxmljsonquestion
23 Posts 7 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P PIEBALDconsult

    honey the codewitch wrote:

    our parsers are fundamentally different

    Yes. I suppose the biggest conceptual difference between ours is that I needed to write a fairly general loader utility which could read a "script" and perform the tasks, not write several purpose-built utilities -- one for each file to be loaded. The ability to have it support CSV (and XML) as well as JSON was an afterthought.

    honey the codewitch wrote:

    I parse no numbers, no strings, nothing, unless you actually request it.

    Well, mine too. It does have to tokenize so it knows when it finds something you want it to parse, but nothing more than that until it finds a requested array. If the script being run says, "if you find the start of an array named 'Widgets', then do this with it", then the parser has to know "I just found an array named 'Widgets'".

    honey the codewitch wrote:

    you are normalizing unconditionally at the parser level

    Well, I suppose so, insofar as I make values (or names) out of every token, but at that point they're just strings -- name/value pairs with a type -- they're not parsed. I throw only those strings which we want at the SQL Server and it handles any conversions to numeric or other types, the loader has no say in that. The loader has no say in data normalization either, it's just passing values as SQL parameters. Again, I want nearly every value in the file to go to the database, so of course I wind up with every value and throw them all at SQL Server. It may be a misunderstanding of terms, but in my opinion, no actual "parsing" is done until the (string) values arrive at SQL Server -- that's where the determinations of which name/value pairs go where, what SQL datatype they should be, etc. happens. The loader utility has no knowledge of any of that.

    H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #21

    I'm using parsing in the traditional CS sense of imposing structure on a lexical stream based on patterns in said stream.

    Real programmers use butterflies

    1 Reply Last reply
    0
    • P PIEBALDconsult

      honey the codewitch wrote:

      our parsers are fundamentally different

      Yes. I suppose the biggest conceptual difference between ours is that I needed to write a fairly general loader utility which could read a "script" and perform the tasks, not write several purpose-built utilities -- one for each file to be loaded. The ability to have it support CSV (and XML) as well as JSON was an afterthought.

      honey the codewitch wrote:

      I parse no numbers, no strings, nothing, unless you actually request it.

      Well, mine too. It does have to tokenize so it knows when it finds something you want it to parse, but nothing more than that until it finds a requested array. If the script being run says, "if you find the start of an array named 'Widgets', then do this with it", then the parser has to know "I just found an array named 'Widgets'".

      honey the codewitch wrote:

      you are normalizing unconditionally at the parser level

      Well, I suppose so, insofar as I make values (or names) out of every token, but at that point they're just strings -- name/value pairs with a type -- they're not parsed. I throw only those strings which we want at the SQL Server and it handles any conversions to numeric or other types, the loader has no say in that. The loader has no say in data normalization either, it's just passing values as SQL parameters. Again, I want nearly every value in the file to go to the database, so of course I wind up with every value and throw them all at SQL Server. It may be a misunderstanding of terms, but in my opinion, no actual "parsing" is done until the (string) values arrive at SQL Server -- that's where the determinations of which name/value pairs go where, what SQL datatype they should be, etc. happens. The loader utility has no knowledge of any of that.

      H Offline
      H Offline
      honey the codewitch
      wrote on last edited by
      #22

      PIEBALDconsult wrote:

      Well, mine too. It does have to tokenize so it knows when it finds something you want it to parse,

      I have other ways of finding something. I switch to a fast matching algorithm where I basically look for a quote as if the document were a flat stream of characters and not a hierarchical ordered structure of logical JSON elements. That's what I mean by partial parsing and part of what I mean by denormalized searching/scanning. It ignores swaths of the document until it finds what you want. For example

      reader.skipToField("name",JsonReader::Forward);

      This performs the type of flat match that I'm talking about.

      reader.skipToField("name",JsonReader::Siblings);

      This performs a partially flat and partially structured match, looking for name on this level of the object heirarchy.

      reader.skipToField("name",JsonReader::Descendants);

      This does a nearly flat match, but basically counts '{' and '}' so it knows when to stop searching. I've simplified the explanation of what I've done, but that's the gist. I also don't load strings into memory at all when comparing them. I compare one character at a time straight off the "disk" so I never know the whole field name unless it's the one I'm after.

      Real programmers use butterflies

      1 Reply Last reply
      0
      • H honey the codewitch

        It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser) The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept. I don't want to waste time with it if it's something most people have heard of before. It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.

        Real programmers use butterflies

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #23

        I actually didn't have a clue what it was, but I'm a total noob so I don't count anyway ;)

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups