Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. A little light reading

A little light reading

Scheduled Pinned Locked Moved The Lounge
com
18 Posts 9 Posters 2 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • honey the codewitchH Offline
    honey the codewitchH Offline
    honey the codewitch
    wrote on last edited by
    #1

    Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions

    Real programmers use butterflies

    P C L 4 Replies Last reply
    0
    • honey the codewitchH honey the codewitch

      Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions

      Real programmers use butterflies

      P Online
      P Online
      PIEBALDconsult
      wrote on last edited by
      #2

      But do they still suffer from the "Shlemiel the painter’s algorithm"? Back to Basics – Joel on Software[^]

      R 1 Reply Last reply
      0
      • honey the codewitchH honey the codewitch

        Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions

        Real programmers use butterflies

        C Offline
        C Offline
        Chris Maunder
        wrote on last edited by
        #3

        Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)

        cheers Chris Maunder

        honey the codewitchH theoldfoolT J W 4 Replies Last reply
        0
        • honey the codewitchH honey the codewitch

          Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions

          Real programmers use butterflies

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          Hmmm, Peter is an old codeproject member. About 15 years ago he wrote the fastest Mandelbrot/Julia rendering engine[^]. Best Wishes, -David Delaune

          R 1 Reply Last reply
          0
          • C Chris Maunder

            Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)

            cheers Chris Maunder

            honey the codewitchH Offline
            honey the codewitchH Offline
            honey the codewitch
            wrote on last edited by
            #5

            I haven't even started to use that yet, because I'm trying the portable strpbrk() function over a memory mapped file first. My results are fire.

            Approx stack size of local JSON stuff is 176 bytes
            Read 1290495 nodes and 20383269 characters in 249.894000 ms at 76.032238MB/s
            Skipped 1290495 nodes and 20383268 characters in 33.278000 ms at 570.947773MB/s
            utf8 scanned 20383269 characters in 75.141000 ms at 252.857960MB/s
            raw ascii i/o 20383269 characters in 58.162000 ms at 326.673773MB/s
            raw ascii block i/o 19 blocks in 3.130000 ms at 6070.287540MB/s

            Bold line is where I search fast through a document Edit: Fixed a fencepost error in counting the position. Edit 2: More complete benchmarks:

            Query is $.season[7].episode[2].overview
            Approx stack size of local JSON stuff is 152 bytes
            Found "Labore magna sint occaecat ea officia labore sit voluptate ut fugiat. Nisi qui commodo consectetur officia incididunt anim do culpa eu. Eu ea magna aliqua excepteur et. Qui eiusmod irure adipisicing enim aute nostrud deserunt eiusmod quis culpa id.rn" and scanned 7149420 characters in 12.034000 ms at 498.587336MB/s

            Query is $..id:
            Approx stack size of local JSON stuff is 152 bytes
            Found 40008 fields and scanned 20383269 characters in 65.563000 ms at 289.797599MB/s

            Approx stack size of local JSON stuff is 176 bytes
            Read 1290495 nodes and 20383269 characters in 256.696000 ms at 74.017515MB/s
            Skipped 1290495 nodes and 20383269 characters in 33.527000 ms at 566.707430MB/s
            utf8 scanned 20383269 characters in 72.913000 ms at 260.584532MB/s
            raw ascii i/o 20383269 characters in 57.787000 ms at 328.793673MB/s
            raw ascii block i/o 19 blocks in 3.106000 ms at 6117.192531MB/s

            Edit: Found and fixed a bug with some escape characters not getting translated. (regression when I introduced my fast scanning)

            Real programmers use butterflies

            1 Reply Last reply
            0
            • P PIEBALDconsult

              But do they still suffer from the "Shlemiel the painter’s algorithm"? Back to Basics – Joel on Software[^]

              R Offline
              R Offline
              Rick York
              wrote on last edited by
              #6

              The last paragraph of that is pretty good and it's about twenty years old now.

              "They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

              1 Reply Last reply
              0
              • honey the codewitchH honey the codewitch

                Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions

                Real programmers use butterflies

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                SSE 4.2 is overrated though. The instructions are neat, but they don't execute very quickly. There are also no AVX2 equivalents, just a VEX-encoded version of the 128bit operations. Overall, SSE 4.2 usually doesn't work out that well, though it has niche uses, and [it turns out that the PCMPEQB & PMOVMSKB combo wins](http://0x80.pl/articles/simd-strfind.html). It's a bit more boring perhaps, but it turns out that just because something is made for a particular purpose, that doesn't make it the best for that purpose. Glibc also [uses generic instructions](https://code.woboq.org/userspace/glibc/sysdeps/x86\_64/multiarch/strlen-avx2.S.html) instead of the SSE 4.2 special stuff.

                honey the codewitchH 1 Reply Last reply
                0
                • L Lost User

                  SSE 4.2 is overrated though. The instructions are neat, but they don't execute very quickly. There are also no AVX2 equivalents, just a VEX-encoded version of the 128bit operations. Overall, SSE 4.2 usually doesn't work out that well, though it has niche uses, and [it turns out that the PCMPEQB & PMOVMSKB combo wins](http://0x80.pl/articles/simd-strfind.html). It's a bit more boring perhaps, but it turns out that just because something is made for a particular purpose, that doesn't make it the best for that purpose. Glibc also [uses generic instructions](https://code.woboq.org/userspace/glibc/sysdeps/x86\_64/multiarch/strlen-avx2.S.html) instead of the SSE 4.2 special stuff.

                  honey the codewitchH Offline
                  honey the codewitchH Offline
                  honey the codewitch
                  wrote on last edited by
                  #8

                  In the end I didn't have to worry about it. I found out how optimized strpbrk() is and I'm using it over a memory mapped file. I'm searching through JSON picking out fields about 560MB/s now :-D That's satisfying enough, and more portable (memory mapped stuff isn't 100% but i have code for windows and i think either posix or linux so it works with both and falls back)

                  Real programmers use butterflies

                  1 Reply Last reply
                  0
                  • L Lost User

                    Hmmm, Peter is an old codeproject member. About 15 years ago he wrote the fastest Mandelbrot/Julia rendering engine[^]. Best Wishes, -David Delaune

                    R Offline
                    R Offline
                    Rick York
                    wrote on last edited by
                    #9

                    That is pretty interesting stuff! I used to be a fractal fanatic and spent a lot of time optimizing algorithms and investigating alternatives. Then I came across GPUs and CUDA and the search was over.

                    "They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

                    W 1 Reply Last reply
                    0
                    • C Chris Maunder

                      Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)

                      cheers Chris Maunder

                      theoldfoolT Offline
                      theoldfoolT Offline
                      theoldfool
                      wrote on last edited by
                      #10

                      Too high level. Anyone remember:

                      C:\>debug
                      -D
                      0B06:0100 75 60 C6 46 00 00 8A 7E-04 F6 C7 04 74 E6 C6 46 u`.F...~....t..F
                      0B06:0110 00 02 8B 76 02 80 3C 00-74 4B B3 2E 34 00 F5 0A ...v..<.tK..4...
                      0B06:0120 B3 3A 38 5C FE 74 05 C6-46 00 01 4E 32 DB 86 1C .:8\.t..F..N2...
                      0B06:0130 E8 39 EB 3B D6 73 1B 56-51 8B CE 8B F2 AC E8 B2 .9.;.s.VQ.......
                      0B06:0140 E1 74 09 AC 3B F1 72 F5-59 5E EB 0B 3B F1 72 ED .t..;.r.Y^..;.r.
                      0B06:0150 59 5E 3A 5C FF 74 0E B4-3B CD 21 86 1C 73 95 E8 Y^:\.t..;.!..s..
                      0B06:0160 9B DA E9 C9 D7 E9 C3 D7-89 7E 02 80 46 01 0C B8 .........~..F...
                      0B06:0170 3F 2E B9 08 00 F3 AA 86-C4 AA 86 C4 B1 03 F3 AA ?...............

                      Now, those were the (so-called) good old days. :)

                      If you can keep your head while those about you are losing theirs, perhaps you don't understand the situation.

                      honey the codewitchH 1 Reply Last reply
                      0
                      • theoldfoolT theoldfool

                        Too high level. Anyone remember:

                        C:\>debug
                        -D
                        0B06:0100 75 60 C6 46 00 00 8A 7E-04 F6 C7 04 74 E6 C6 46 u`.F...~....t..F
                        0B06:0110 00 02 8B 76 02 80 3C 00-74 4B B3 2E 34 00 F5 0A ...v..<.tK..4...
                        0B06:0120 B3 3A 38 5C FE 74 05 C6-46 00 01 4E 32 DB 86 1C .:8\.t..F..N2...
                        0B06:0130 E8 39 EB 3B D6 73 1B 56-51 8B CE 8B F2 AC E8 B2 .9.;.s.VQ.......
                        0B06:0140 E1 74 09 AC 3B F1 72 F5-59 5E EB 0B 3B F1 72 ED .t..;.r.Y^..;.r.
                        0B06:0150 59 5E 3A 5C FF 74 0E B4-3B CD 21 86 1C 73 95 E8 Y^:\.t..;.!..s..
                        0B06:0160 9B DA E9 C9 D7 E9 C3 D7-89 7E 02 80 46 01 0C B8 .........~..F...
                        0B06:0170 3F 2E B9 08 00 F3 AA 86-C4 AA 86 C4 B1 03 F3 AA ?...............

                        Now, those were the (so-called) good old days. :)

                        If you can keep your head while those about you are losing theirs, perhaps you don't understand the situation.

                        honey the codewitchH Offline
                        honey the codewitchH Offline
                        honey the codewitch
                        wrote on last edited by
                        #11

                        Yes. That reminds me of when i learned 6502 bytecode before i realized i had a built in mini-assembler.

                        Real programmers use butterflies

                        T 1 Reply Last reply
                        0
                        • honey the codewitchH honey the codewitch

                          Yes. That reminds me of when i learned 6502 bytecode before i realized i had a built in mini-assembler.

                          Real programmers use butterflies

                          T Offline
                          T Offline
                          trønderen
                          wrote on last edited by
                          #12

                          The first assembler I used was simply adding symbols. The instruction set was very regular (the CPU architecture from the days long before microcode), so opcode, modifiers and offsets all had their fixed place in the instruction word. We played around with this: To generate a MUL (multiply) instruction, you could rather use ADD ADD, as the opcode for MUL was twice the opcode of ADD :-)

                          1 Reply Last reply
                          0
                          • C Chris Maunder

                            Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)

                            cheers Chris Maunder

                            J Offline
                            J Offline
                            Jorgen Andersson
                            wrote on last edited by
                            #13

                            If you consider one weekend enough... :omg:

                            Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

                            1 Reply Last reply
                            0
                            • C Chris Maunder

                              Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)

                              cheers Chris Maunder

                              W Offline
                              W Offline
                              W Balboos GHB
                              wrote on last edited by
                              #14

                              Chris Maunder wrote:

                              but have never actually bothered trying to learn a single instruction.

                              Allow me to get you started:

                              MOV Chris, Good_Book;
                              JMP ASM_PRO;

                              Ravings en masse^

                              "The difference between genius and stupidity is that genius has its limits." - Albert Einstein

                              "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010

                              C 1 Reply Last reply
                              0
                              • R Rick York

                                That is pretty interesting stuff! I used to be a fractal fanatic and spent a lot of time optimizing algorithms and investigating alternatives. Then I came across GPUs and CUDA and the search was over.

                                "They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

                                W Offline
                                W Offline
                                W Balboos GHB
                                wrote on last edited by
                                #15

                                You need to check out FRACTINT[^] - more fractal than even a fanatical fanatic can handle. It just seems to have more and more features. The Wikipedia link hardly touches the surface.

                                Ravings en masse^

                                "The difference between genius and stupidity is that genius has its limits." - Albert Einstein

                                "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010

                                R 1 Reply Last reply
                                0
                                • W W Balboos GHB

                                  You need to check out FRACTINT[^] - more fractal than even a fanatical fanatic can handle. It just seems to have more and more features. The Wikipedia link hardly touches the surface.

                                  Ravings en masse^

                                  "The difference between genius and stupidity is that genius has its limits." - Albert Einstein

                                  "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010

                                  R Offline
                                  R Offline
                                  Rick York
                                  wrote on last edited by
                                  #16

                                  Yes, I have it and it is quite good. I got lots of ideas from it. For the highest performing fractal program I have ever seen - check out the Mandelbrot sample that comes with the CUDA SDK. It calculates in real time. You can pan and zoom and updates are instantaneous. It is really fast.

                                  "They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

                                  1 Reply Last reply
                                  0
                                  • W W Balboos GHB

                                    Chris Maunder wrote:

                                    but have never actually bothered trying to learn a single instruction.

                                    Allow me to get you started:

                                    MOV Chris, Good_Book;
                                    JMP ASM_PRO;

                                    Ravings en masse^

                                    "The difference between genius and stupidity is that genius has its limits." - Albert Einstein

                                    "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010

                                    C Offline
                                    C Offline
                                    Chris Maunder
                                    wrote on last edited by
                                    #17

                                    I have a short attention span. Does it contain pictures and large fonts?

                                    cheers Chris Maunder

                                    W 1 Reply Last reply
                                    0
                                    • C Chris Maunder

                                      I have a short attention span. Does it contain pictures and large fonts?

                                      cheers Chris Maunder

                                      W Offline
                                      W Offline
                                      W Balboos GHB
                                      wrote on last edited by
                                      #18

                                      Does what?

                                      Ravings en masse^

                                      "The difference between genius and stupidity is that genius has its limits." - Albert Einstein

                                      "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010

                                      1 Reply Last reply
                                      0
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Don't have an account? Register

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups