Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. I'd like to ask a question about JSON to get a feel for priorities of coders here

I'd like to ask a question about JSON to get a feel for priorities of coders here

Scheduled Pinned Locked Moved The Lounge
questionjsonhelp
54 Posts 24 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M MGuerrieri

    I take a function-first approach. You won't be able to parse the JSON if it's not well formed, so I would do that check first. If performance is poor, then I'd do a trace to find the bottlenecks and address them if possible. I wouldn't want to spend my time unnecessarily tracking down import errors.

    H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #42

    I look at it this way - and keep in mind this is purely hypothetical: Let's say you're bulk uploading parts of some JSON out of a huge dataset. Almost always that JSON is machine generated because who writes huge JSON by hand? Scanning through it quickly is important. If at some point you get a bad data dump, might it be better to roll back that update and then run a validator over the bad document that one time out 1000 when it fails, rather than paying for that validation every other 999 times?

    Real programmers use butterflies

    1 Reply Last reply
    0
    • H honey the codewitch

      Let's say you wanted to write a fast JSON parser. You could do a pull parser that does well-formedness checking Or you could do one that's significantly faster but skips well formedness checking during search/skip operations, which can lead to later error reporting or missed errors You can't make an option to choose one or the other, but you can avoid using the skip/search functions that do this in the latter case. Which do you do? Are you a stomp-the-pedal type or a defensive driver? (Seriously, this is more about getting a read of the room than anything - I want a feel for priorities)

      Real programmers use butterflies

      M Offline
      M Offline
      Mark Meuer
      wrote on last edited by
      #43

      As a general rule, I try to follow these steps in order: 1. Make the program run right. 2. Make the program run right. 3. Make the program run right. 4. If I really need to, make it faster.

      H P 2 Replies Last reply
      0
      • M Mark Meuer

        As a general rule, I try to follow these steps in order: 1. Make the program run right. 2. Make the program run right. 3. Make the program run right. 4. If I really need to, make it faster.

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #44

        That works to a point but certain design decisions for performance must be made up front. For example, deciding to use a pull parser as the primary way of navigation rather than an in memory tree.

        Real programmers use butterflies

        1 Reply Last reply
        0
        • R Reelix

          If you're allowed to upgrade to .NET 5, they effectively implemented Newtonsofts one natively with pretty much the identical syntax. Works really well, and you're not using third-party add-ins.

          -= Reelix =-

          P Offline
          P Offline
          PIEBALDconsult
          wrote on last edited by
          #45

          Yup, looking forward to it. Not holding my breath. It doesn't help that my boss read a blog that said that Microsoft is abandoning .net ( :sigh: ). Middle-managers will believe anything if it's in a blog. I countered with a link to Microsoft's road map for the future of .net, but the damage was already done.

          1 Reply Last reply
          0
          • M Mark Meuer

            As a general rule, I try to follow these steps in order: 1. Make the program run right. 2. Make the program run right. 3. Make the program run right. 4. If I really need to, make it faster.

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #46

            4a. You always need to make it faster.

            1 Reply Last reply
            0
            • H honey the codewitch

              Let's say you wanted to write a fast JSON parser. You could do a pull parser that does well-formedness checking Or you could do one that's significantly faster but skips well formedness checking during search/skip operations, which can lead to later error reporting or missed errors You can't make an option to choose one or the other, but you can avoid using the skip/search functions that do this in the latter case. Which do you do? Are you a stomp-the-pedal type or a defensive driver? (Seriously, this is more about getting a read of the room than anything - I want a feel for priorities)

              Real programmers use butterflies

              D Offline
              D Offline
              davecasdf
              wrote on last edited by
              #47

              Case 1, input good, output good - answer faster, Case 2, input bad, error message, but delayed ( is it clear ) Case 3, input bad, program sticks it's tongue out and dies Case 4, input bad, quiet wrong result so, who has to deal with 3, and can 4 happen? ( with malicious input ?) ( They ARE out to get you. )

              1 Reply Last reply
              0
              • P PIEBALDconsult

                I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes. I load 5GB of JSON with my own parser. It takes about eight minutes. I load 80GB of JSON with my own parser -- this dataset has tripled in size over the last month. It's now taking about five hours. These datasets are in no way comparable, I'm just comparing the size-on-disk of the files. I will, of course, accept that my JSON loader is a likely bottleneck, but I have nothing else to compare it against. It seemed "good enough" two years ago when I had a year-end deadline to meet. I may also be able to configure my JSON Loader to use BulkCopy, as I do for the 5GB dataset, but I seem to recall that the data wasn't suited to it. At any rate, I'm in need of an alternative, but it can't be third-party. Next year will be different.

                J Offline
                J Offline
                Jorgen Andersson
                wrote on last edited by
                #48

                PIEBALDconsult wrote:

                I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.

                How much memory do you have? Early tests of mine ran out of memory. Or have I done something wrong? Mine takes an hour for 85GB XML, but that uses bulkcopy. Early versions without bulkcopy indicated that it would indeed take 5-6 hours.

                Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                P 1 Reply Last reply
                0
                • P PIEBALDconsult

                  I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes. I load 5GB of JSON with my own parser. It takes about eight minutes. I load 80GB of JSON with my own parser -- this dataset has tripled in size over the last month. It's now taking about five hours. These datasets are in no way comparable, I'm just comparing the size-on-disk of the files. I will, of course, accept that my JSON loader is a likely bottleneck, but I have nothing else to compare it against. It seemed "good enough" two years ago when I had a year-end deadline to meet. I may also be able to configure my JSON Loader to use BulkCopy, as I do for the 5GB dataset, but I seem to recall that the data wasn't suited to it. At any rate, I'm in need of an alternative, but it can't be third-party. Next year will be different.

                  H Offline
                  H Offline
                  honey the codewitch
                  wrote on last edited by
                  #49

                  if you can run C++ binaries on the server this might give you better performance, especially if you're only doing loads of part of the data. JSON on Fire: JSON (C++) is a Blazing JSON Library that can Run on Low Memory Devices[^]

                  Real programmers use butterflies

                  1 Reply Last reply
                  0
                  • J Jorgen Andersson

                    PIEBALDconsult wrote:

                    I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.

                    How much memory do you have? Early tests of mine ran out of memory. Or have I done something wrong? Mine takes an hour for 85GB XML, but that uses bulkcopy. Early versions without bulkcopy indicated that it would indeed take 5-6 hours.

                    Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                    P Offline
                    P Offline
                    PIEBALDconsult
                    wrote on last edited by
                    #50

                    I don't know what SSIS does internally, but I doubt it loads the entire XML document into memory all at once. I don't know how much RAM or how many processors the servers have. I ran the XML load on my laptop, 16GB of RAM and usage increased by only four percent.

                    J 1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      I don't know what SSIS does internally, but I doubt it loads the entire XML document into memory all at once. I don't know how much RAM or how many processors the servers have. I ran the XML load on my laptop, 16GB of RAM and usage increased by only four percent.

                      J Offline
                      J Offline
                      Jorgen Andersson
                      wrote on last edited by
                      #51

                      Ok, then I had some other problem, I might take another look at SSIS then.

                      Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                      1 Reply Last reply
                      0
                      • P PIEBALDconsult

                        Yeah, no, we can't deploy any third-party stuff to the servers, it has to be either build-in .net or stuff we implement.

                        H Offline
                        H Offline
                        harvyk0
                        wrote on last edited by
                        #52

                        Whilst I can understand there may be restrictions on simply downloading (or worse simply referencing an external 3rd party site), surely there is a mechanism to obtain an external library under controlled circumstances? Personally I see it no different than choosing a piece of off-the-shelf software. Go through the standard checks you would do before deploying any other 3rd party piece of software and you'd be fine. I've written parsers in the past (albeit XML). I spent weeks accounting for our specific use case scenario, it wasn't even fully featured. All it would take was the external source of the XML to decide to implement something that my parser hadn't included and the system would have fallen over. So I would be going to your higher-ups with the following three arguments. 1. A custom JSON parser is likely to take weeks to develop and test 2. A custom JSON parser has potential to not correctly implement the full set of JSON rules, thus would be a risk to the project. 3. Evaluating JSON parser options for suitability and security would be a quicker, cheaper, and potentially more secure option than attempting to build your own parser (since it's is possible to embed executable code into JSON)

                        P 1 Reply Last reply
                        0
                        • H harvyk0

                          Whilst I can understand there may be restrictions on simply downloading (or worse simply referencing an external 3rd party site), surely there is a mechanism to obtain an external library under controlled circumstances? Personally I see it no different than choosing a piece of off-the-shelf software. Go through the standard checks you would do before deploying any other 3rd party piece of software and you'd be fine. I've written parsers in the past (albeit XML). I spent weeks accounting for our specific use case scenario, it wasn't even fully featured. All it would take was the external source of the XML to decide to implement something that my parser hadn't included and the system would have fallen over. So I would be going to your higher-ups with the following three arguments. 1. A custom JSON parser is likely to take weeks to develop and test 2. A custom JSON parser has potential to not correctly implement the full set of JSON rules, thus would be a risk to the project. 3. Evaluating JSON parser options for suitability and security would be a quicker, cheaper, and potentially more secure option than attempting to build your own parser (since it's is possible to embed executable code into JSON)

                          P Offline
                          P Offline
                          PIEBALDconsult
                          wrote on last edited by
                          #53

                          In theory, sure, but no, it's even worse than that. For instance, we used to use the .net provider for MySQL, but then the server and desktop approval teams couldn't agree on which version to approve -- leaving us with no way to develop an application which would work when deployed to the servers.

                          H 1 Reply Last reply
                          0
                          • P PIEBALDconsult

                            In theory, sure, but no, it's even worse than that. For instance, we used to use the .net provider for MySQL, but then the server and desktop approval teams couldn't agree on which version to approve -- leaving us with no way to develop an application which would work when deployed to the servers.

                            H Offline
                            H Offline
                            harvyk0
                            wrote on last edited by
                            #54

                            Ah, yes, I can relate to that sort of silliness.

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups