Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Performance woes. I'm appalled.

Performance woes. I'm appalled.

Scheduled Pinned Locked Moved The Lounge
jsonperformancevisual-studiohelptutorial
24 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    Read 1231388 nodes and 20383269 characters in 1069.479000 ms at 17.765660MB/s
    Skipped 1231388 nodes and 20383269 characters in 534.699000 ms at 35.534011MB/s
    utf8 scanned 20383269 characters in 377.561000 ms at 50.322994MB/s
    raw ascii i/o 20383269 characters in 62.034000 ms at 306.283651MB/s
    raw ascii block i/o 19 blocks in 49.023000 ms at 387.573180MB/s

    The first line is full JSON parsing The second line is JSON "skipping" - a minimal read where it doesn't normalize anything it just moves as fast as possible through the document. The third line is ut8 reading through my input source class but without doing anything JSON related The fourth line is calling fgetc() in a loop The fifth line is falling fread() in a loop and then scanning over the characters in each block (so i'm not totally cheating by not examining characters) The issue here is the difference between my third line and the fourth line (utf8 scan vs fgetc). The trouble is even when I removed the encoding it made no measurable difference in speed. Underneath everything both are using fgetc. Even when I changed mine to block read using fread() it didn't speed things up. I'm at a loss. I'm not asking a question here, mostly just expressing frustration because i have not a clue how to optimize this.

    Real programmers use butterflies

    L D T O 5 Replies Last reply
    0
    • H honey the codewitch

      Read 1231388 nodes and 20383269 characters in 1069.479000 ms at 17.765660MB/s
      Skipped 1231388 nodes and 20383269 characters in 534.699000 ms at 35.534011MB/s
      utf8 scanned 20383269 characters in 377.561000 ms at 50.322994MB/s
      raw ascii i/o 20383269 characters in 62.034000 ms at 306.283651MB/s
      raw ascii block i/o 19 blocks in 49.023000 ms at 387.573180MB/s

      The first line is full JSON parsing The second line is JSON "skipping" - a minimal read where it doesn't normalize anything it just moves as fast as possible through the document. The third line is ut8 reading through my input source class but without doing anything JSON related The fourth line is calling fgetc() in a loop The fifth line is falling fread() in a loop and then scanning over the characters in each block (so i'm not totally cheating by not examining characters) The issue here is the difference between my third line and the fourth line (utf8 scan vs fgetc). The trouble is even when I removed the encoding it made no measurable difference in speed. Underneath everything both are using fgetc. Even when I changed mine to block read using fread() it didn't speed things up. I'm at a loss. I'm not asking a question here, mostly just expressing frustration because i have not a clue how to optimize this.

      Real programmers use butterflies

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      What does "utf8 scan" actually do? Perhaps you can use some of the UTF-8 tricks used by simdjson.

      H 1 Reply Last reply
      0
      • L Lost User

        What does "utf8 scan" actually do? Perhaps you can use some of the UTF-8 tricks used by simdjson.

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #3

        That's actually what I'm going to do is look into simd eventually but it's not the utf8 encoding that is the issue. I turned it off and got a similar result. There's something about the way my LexSource class is dealing with I/O, and/or I'm examining the codepoints/characers i get back way too many times. I'm not sure which yet or if it's both.

        Real programmers use butterflies

        1 Reply Last reply
        0
        • H honey the codewitch

          Read 1231388 nodes and 20383269 characters in 1069.479000 ms at 17.765660MB/s
          Skipped 1231388 nodes and 20383269 characters in 534.699000 ms at 35.534011MB/s
          utf8 scanned 20383269 characters in 377.561000 ms at 50.322994MB/s
          raw ascii i/o 20383269 characters in 62.034000 ms at 306.283651MB/s
          raw ascii block i/o 19 blocks in 49.023000 ms at 387.573180MB/s

          The first line is full JSON parsing The second line is JSON "skipping" - a minimal read where it doesn't normalize anything it just moves as fast as possible through the document. The third line is ut8 reading through my input source class but without doing anything JSON related The fourth line is calling fgetc() in a loop The fifth line is falling fread() in a loop and then scanning over the characters in each block (so i'm not totally cheating by not examining characters) The issue here is the difference between my third line and the fourth line (utf8 scan vs fgetc). The trouble is even when I removed the encoding it made no measurable difference in speed. Underneath everything both are using fgetc. Even when I changed mine to block read using fread() it didn't speed things up. I'm at a loss. I'm not asking a question here, mostly just expressing frustration because i have not a clue how to optimize this.

          Real programmers use butterflies

          D Offline
          D Offline
          Daniel Pfeffer
          wrote on last edited by
          #4

          The real question here is whether the code meets the performance bar. If it does, why bother optimising further? Life is too short...

          Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

          H 1 Reply Last reply
          0
          • D Daniel Pfeffer

            The real question here is whether the code meets the performance bar. If it does, why bother optimising further? Life is too short...

            Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

            H Offline
            H Offline
            honey the codewitch
            wrote on last edited by
            #5

            It's a library, ergo there is no performance bar. It would vary depending on the application of said library. However, a 6* slowdown compared to raw fgetc is worth investigating.

            Real programmers use butterflies

            1 Reply Last reply
            0
            • H honey the codewitch

              Read 1231388 nodes and 20383269 characters in 1069.479000 ms at 17.765660MB/s
              Skipped 1231388 nodes and 20383269 characters in 534.699000 ms at 35.534011MB/s
              utf8 scanned 20383269 characters in 377.561000 ms at 50.322994MB/s
              raw ascii i/o 20383269 characters in 62.034000 ms at 306.283651MB/s
              raw ascii block i/o 19 blocks in 49.023000 ms at 387.573180MB/s

              The first line is full JSON parsing The second line is JSON "skipping" - a minimal read where it doesn't normalize anything it just moves as fast as possible through the document. The third line is ut8 reading through my input source class but without doing anything JSON related The fourth line is calling fgetc() in a loop The fifth line is falling fread() in a loop and then scanning over the characters in each block (so i'm not totally cheating by not examining characters) The issue here is the difference between my third line and the fourth line (utf8 scan vs fgetc). The trouble is even when I removed the encoding it made no measurable difference in speed. Underneath everything both are using fgetc. Even when I changed mine to block read using fread() it didn't speed things up. I'm at a loss. I'm not asking a question here, mostly just expressing frustration because i have not a clue how to optimize this.

              Real programmers use butterflies

              T Offline
              T Offline
              trønderen
              wrote on last edited by
              #6

              Haven't you got a profiler in your toolbox? Doing any sort of optimizing without a profiler is futile. If you can't see which source lines consumes the most time, you won't have a clue about where and how to put your optimizing efforts.

              H 2 Replies Last reply
              0
              • T trønderen

                Haven't you got a profiler in your toolbox? Doing any sort of optimizing without a profiler is futile. If you can't see which source lines consumes the most time, you won't have a clue about where and how to put your optimizing efforts.

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #7

                I just got done doing broad profiling. I haven't instrumented my code for specific profiling yet because I hadn't identified the bottleneck until I wrote that post. I've been punting it until I got some other features implemented that needed doing (I needed them in there so I could benchmark them as well) but that's my next thing, because I just finished adding those features I mentioned just now.

                Real programmers use butterflies

                1 Reply Last reply
                0
                • H honey the codewitch

                  Read 1231388 nodes and 20383269 characters in 1069.479000 ms at 17.765660MB/s
                  Skipped 1231388 nodes and 20383269 characters in 534.699000 ms at 35.534011MB/s
                  utf8 scanned 20383269 characters in 377.561000 ms at 50.322994MB/s
                  raw ascii i/o 20383269 characters in 62.034000 ms at 306.283651MB/s
                  raw ascii block i/o 19 blocks in 49.023000 ms at 387.573180MB/s

                  The first line is full JSON parsing The second line is JSON "skipping" - a minimal read where it doesn't normalize anything it just moves as fast as possible through the document. The third line is ut8 reading through my input source class but without doing anything JSON related The fourth line is calling fgetc() in a loop The fifth line is falling fread() in a loop and then scanning over the characters in each block (so i'm not totally cheating by not examining characters) The issue here is the difference between my third line and the fourth line (utf8 scan vs fgetc). The trouble is even when I removed the encoding it made no measurable difference in speed. Underneath everything both are using fgetc. Even when I changed mine to block read using fread() it didn't speed things up. I'm at a loss. I'm not asking a question here, mostly just expressing frustration because i have not a clue how to optimize this.

                  Real programmers use butterflies

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  Did that to myself just the other day; a process that ran in about 70 ms started clocking at almost a minute. Async; no sync; didn't matter. Forgot to disable debug output in a critical routine: the overhead was "huge" (in debug mode). Had me going for a while; the (VS) diagnostics showed the cpu profile was not as expected.

                  It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                  H 1 Reply Last reply
                  0
                  • L Lost User

                    Did that to myself just the other day; a process that ran in about 70 ms started clocking at almost a minute. Async; no sync; didn't matter. Forgot to disable debug output in a critical routine: the overhead was "huge" (in debug mode). Had me going for a while; the (VS) diagnostics showed the cpu profile was not as expected.

                    It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #9

                    What's frustrating is this simple case statement loses me about 7-8MB/s in throughput on something that currently tops out at about the low 60s on a good day.

                    switch(ch) {
                    case '\t': // tab
                    m_column+=TabWidth; // tabwidth is static const
                    break;
                    case '\n': // newline
                    ++m_line;
                    // fall through
                    case '\r': // carriage return
                    m_column=0;
                    break;
                    default:
                    ++m_column;
                    break;
                    }

                    Real programmers use butterflies

                    K 1 Reply Last reply
                    0
                    • T trønderen

                      Haven't you got a profiler in your toolbox? Doing any sort of optimizing without a profiler is futile. If you can't see which source lines consumes the most time, you won't have a clue about where and how to put your optimizing efforts.

                      H Offline
                      H Offline
                      honey the codewitch
                      wrote on last edited by
                      #10

                      Switching on characters is killing performance.

                      switch(ch) {
                      case '\t': // tab
                      m_column+=TabWidth; // tabwidth is static const
                      break;
                      case '\n': // newline
                      ++m_line;
                      // fall through
                      case '\r': // carriage return
                      m_column=0;
                      break;
                      default:
                      ++m_column;
                      break;
                      }

                      This loses me 7-8MB/s in throughput on my machine pretty consistently. The problem is I switch on characters everywhere as this is a JSON parser. I can reduce some of my comparisons but not a lot of them because of the way my code is structured. The only other thing I can think of right now is building my own jump table schemes but I really don't want to do that so I'm trying to come up with something else.

                      Real programmers use butterflies

                      T J 2 Replies Last reply
                      0
                      • H honey the codewitch

                        Switching on characters is killing performance.

                        switch(ch) {
                        case '\t': // tab
                        m_column+=TabWidth; // tabwidth is static const
                        break;
                        case '\n': // newline
                        ++m_line;
                        // fall through
                        case '\r': // carriage return
                        m_column=0;
                        break;
                        default:
                        ++m_column;
                        break;
                        }

                        This loses me 7-8MB/s in throughput on my machine pretty consistently. The problem is I switch on characters everywhere as this is a JSON parser. I can reduce some of my comparisons but not a lot of them because of the way my code is structured. The only other thing I can think of right now is building my own jump table schemes but I really don't want to do that so I'm trying to come up with something else.

                        Real programmers use butterflies

                        T Offline
                        T Offline
                        trønderen
                        wrote on last edited by
                        #11

                        If switching on a char takes an inordinate amount of time, I'd sure be curious to know how your compiler does it. It ought to be the most efficient way of switching there is: A jump table indexed by the switch variable, loading the program counter. In the old days, when CPUs were slow, that was the only way to do it. (The first Pascal compiler I used could only take alternatives spanning a 256-value range, because that was the largest jump table it could generate.) Modern languages are far more flexible in their case expressions, often requiring the compiler to generate code like for an "if - elseif - elseif ... else" sequence. Maybe that is what your compiler has done here, maybe even generating "elseif"s for every char value, rather than collecting the "default" in an "else". If the compiler is scared of big jump tables, and therefore uses elseif-constructions, it rather should realize that this makes the code size grow far more than the size of a jump table! I am just guessing! But it sounds crazy that an indexed jump would kill your performance; it just doesn't sound right. I would look at the generated code to see what happens. If you can't make the compiler do an indexed jump, maybe you are better off writing it in longhand, building the jump table from labels :-). I guess that writing it explicitly as "if - elseif..." would be better. Then you could also do the most common case first, so that only a single test is required: "if (ch > '\t') {...}". I hate it when compilers force me to do the job that should be theirs, but maybe you have to, in this case!

                        H 2 Replies Last reply
                        0
                        • T trønderen

                          If switching on a char takes an inordinate amount of time, I'd sure be curious to know how your compiler does it. It ought to be the most efficient way of switching there is: A jump table indexed by the switch variable, loading the program counter. In the old days, when CPUs were slow, that was the only way to do it. (The first Pascal compiler I used could only take alternatives spanning a 256-value range, because that was the largest jump table it could generate.) Modern languages are far more flexible in their case expressions, often requiring the compiler to generate code like for an "if - elseif - elseif ... else" sequence. Maybe that is what your compiler has done here, maybe even generating "elseif"s for every char value, rather than collecting the "default" in an "else". If the compiler is scared of big jump tables, and therefore uses elseif-constructions, it rather should realize that this makes the code size grow far more than the size of a jump table! I am just guessing! But it sounds crazy that an indexed jump would kill your performance; it just doesn't sound right. I would look at the generated code to see what happens. If you can't make the compiler do an indexed jump, maybe you are better off writing it in longhand, building the jump table from labels :-). I guess that writing it explicitly as "if - elseif..." would be better. Then you could also do the most common case first, so that only a single test is required: "if (ch > '\t') {...}". I hate it when compilers force me to do the job that should be theirs, but maybe you have to, in this case!

                          H Offline
                          H Offline
                          honey the codewitch
                          wrote on last edited by
                          #12

                          That's pretty much where I'm at. I'm using gcc, which should be pretty good about optimizing. What gets me is it's not any faster whether I'm using no switches, or -g

                          Real programmers use butterflies

                          1 Reply Last reply
                          0
                          • T trønderen

                            If switching on a char takes an inordinate amount of time, I'd sure be curious to know how your compiler does it. It ought to be the most efficient way of switching there is: A jump table indexed by the switch variable, loading the program counter. In the old days, when CPUs were slow, that was the only way to do it. (The first Pascal compiler I used could only take alternatives spanning a 256-value range, because that was the largest jump table it could generate.) Modern languages are far more flexible in their case expressions, often requiring the compiler to generate code like for an "if - elseif - elseif ... else" sequence. Maybe that is what your compiler has done here, maybe even generating "elseif"s for every char value, rather than collecting the "default" in an "else". If the compiler is scared of big jump tables, and therefore uses elseif-constructions, it rather should realize that this makes the code size grow far more than the size of a jump table! I am just guessing! But it sounds crazy that an indexed jump would kill your performance; it just doesn't sound right. I would look at the generated code to see what happens. If you can't make the compiler do an indexed jump, maybe you are better off writing it in longhand, building the jump table from labels :-). I guess that writing it explicitly as "if - elseif..." would be better. Then you could also do the most common case first, so that only a single test is required: "if (ch > '\t') {...}". I hate it when compilers force me to do the job that should be theirs, but maybe you have to, in this case!

                            H Offline
                            H Offline
                            honey the codewitch
                            wrote on last edited by
                            #13

                            Don't I feel stupid.

                            Approx stack size of local JSON stuff is 160 bytes
                            Read 1290495 nodes and 20383269 characters in 416.631000 ms at 45.603904MB/s
                            Skipped 1290495 nodes and 20383269 characters in 184.131000 ms at 103.187405MB/s
                            utf8 scanned 20383269 characters in 146.422000 ms at 129.761921MB/s
                            raw ascii i/o 20383269 characters in 58.902000 ms at 322.569692MB/s
                            raw ascii block i/o 19 blocks in 3.183000 ms at 5969.211436MB/s

                            Much better. I was using the wrong gcc options. I'm used to msvc

                            Real programmers use butterflies

                            J 1 Reply Last reply
                            0
                            • H honey the codewitch

                              What's frustrating is this simple case statement loses me about 7-8MB/s in throughput on something that currently tops out at about the low 60s on a good day.

                              switch(ch) {
                              case '\t': // tab
                              m_column+=TabWidth; // tabwidth is static const
                              break;
                              case '\n': // newline
                              ++m_line;
                              // fall through
                              case '\r': // carriage return
                              m_column=0;
                              break;
                              default:
                              ++m_column;
                              break;
                              }

                              Real programmers use butterflies

                              K Offline
                              K Offline
                              k5054
                              wrote on last edited by
                              #14

                              Take a look at the [Compiler Explorer](https://godbolt.org/), and see if the assembler output make sense. Maybe an if statement would produce better results, though it would be a less aesthetically pleasing, IMHO.

                              Keep Calm and Carry On

                              H 1 Reply Last reply
                              0
                              • H honey the codewitch

                                Switching on characters is killing performance.

                                switch(ch) {
                                case '\t': // tab
                                m_column+=TabWidth; // tabwidth is static const
                                break;
                                case '\n': // newline
                                ++m_line;
                                // fall through
                                case '\r': // carriage return
                                m_column=0;
                                break;
                                default:
                                ++m_column;
                                break;
                                }

                                This loses me 7-8MB/s in throughput on my machine pretty consistently. The problem is I switch on characters everywhere as this is a JSON parser. I can reduce some of my comparisons but not a lot of them because of the way my code is structured. The only other thing I can think of right now is building my own jump table schemes but I really don't want to do that so I'm trying to come up with something else.

                                Real programmers use butterflies

                                J Offline
                                J Offline
                                Jorgen Andersson
                                wrote on last edited by
                                #15

                                UTF8?

                                Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                                H 1 Reply Last reply
                                0
                                • K k5054

                                  Take a look at the [Compiler Explorer](https://godbolt.org/), and see if the assembler output make sense. Maybe an if statement would produce better results, though it would be a less aesthetically pleasing, IMHO.

                                  Keep Calm and Carry On

                                  H Offline
                                  H Offline
                                  honey the codewitch
                                  wrote on last edited by
                                  #16

                                  I'm a dunce. I had my compiler options set wrong. :laugh: This is my latest, somewhat fixed output, but it could be a lot faster. I want utf8 in the GB/s range or at least spitting distance of it on my machine

                                  Approx stack size of local JSON stuff is 176 bytes
                                  Read 1290495 nodes and 20383269 characters in 272.591000 ms at 69.701494MB/s
                                  Skipped 1290495 nodes and 20383269 characters in 118.066000 ms at 160.926939MB/s
                                  utf8 scanned 20383269 characters in 91.398000 ms at 207.882011MB/s
                                  raw ascii i/o 20383269 characters in 57.443000 ms at 330.762669MB/s
                                  raw ascii block i/o 19 blocks in 3.024000 ms at 6283.068783MB/s

                                  I just tried a branchless utf8 decoding routine but it proved to be slower than my original version. However, it's closer to something that could be converted to simd instructions so i'm exploring that more.

                                  Real programmers use butterflies

                                  1 Reply Last reply
                                  0
                                  • J Jorgen Andersson

                                    UTF8?

                                    Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                                    H Offline
                                    H Offline
                                    honey the codewitch
                                    wrote on last edited by
                                    #17

                                    Yeah, it's a unicode encoding format. Most characters are one byte so it's ascii-ish except for the extended character range. However, it's a bit involved to decode it. Implementing the JSON spec requires UTF-8 support.

                                    Real programmers use butterflies

                                    J 1 Reply Last reply
                                    0
                                    • H honey the codewitch

                                      Yeah, it's a unicode encoding format. Most characters are one byte so it's ascii-ish except for the extended character range. However, it's a bit involved to decode it. Implementing the JSON spec requires UTF-8 support.

                                      Real programmers use butterflies

                                      J Offline
                                      J Offline
                                      Jorgen Andersson
                                      wrote on last edited by
                                      #18

                                      Well, it triples the time needed. I would make it an option to choose ansi or ascii in the case where performance is an issue, but encoding isn't

                                      Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                                      H 1 Reply Last reply
                                      0
                                      • J Jorgen Andersson

                                        Well, it triples the time needed. I would make it an option to choose ansi or ascii in the case where performance is an issue, but encoding isn't

                                        Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                                        H Offline
                                        H Offline
                                        honey the codewitch
                                        wrote on last edited by
                                        #19

                                        The only issue with that is I'm trying to make it spec compliant, but i considered making it an option. I may yet, as it's quite a bit faster, but first i want to see how quick i can get the utf8 support. It won't triple the time needed if I can process 4 bytes at a time using simd :-D

                                        Real programmers use butterflies

                                        J 1 Reply Last reply
                                        0
                                        • H honey the codewitch

                                          The only issue with that is I'm trying to make it spec compliant, but i considered making it an option. I may yet, as it's quite a bit faster, but first i want to see how quick i can get the utf8 support. It won't triple the time needed if I can process 4 bytes at a time using simd :-D

                                          Real programmers use butterflies

                                          J Offline
                                          J Offline
                                          Jorgen Andersson
                                          wrote on last edited by
                                          #20

                                          honey the codewitch wrote:

                                          but first i want to see how quick i can get the utf8 support.

                                          Obviously! :-D

                                          Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups