A little light reading
-
But do they still suffer from the "Shlemiel the painter’s algorithm"? Back to Basics – Joel on Software[^]
-
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com[^] strchr.com is cool. I found it while looking for SSE string comparison instructions
Real programmers use butterflies
SSE 4.2 is overrated though. The instructions are neat, but they don't execute very quickly. There are also no AVX2 equivalents, just a VEX-encoded version of the 128bit operations. Overall, SSE 4.2 usually doesn't work out that well, though it has niche uses, and [it turns out that the PCMPEQB & PMOVMSKB combo wins](http://0x80.pl/articles/simd-strfind.html). It's a bit more boring perhaps, but it turns out that just because something is made for a particular purpose, that doesn't make it the best for that purpose. Glibc also [uses generic instructions](https://code.woboq.org/userspace/glibc/sysdeps/x86\_64/multiarch/strlen-avx2.S.html) instead of the SSE 4.2 special stuff.
-
SSE 4.2 is overrated though. The instructions are neat, but they don't execute very quickly. There are also no AVX2 equivalents, just a VEX-encoded version of the 128bit operations. Overall, SSE 4.2 usually doesn't work out that well, though it has niche uses, and [it turns out that the PCMPEQB & PMOVMSKB combo wins](http://0x80.pl/articles/simd-strfind.html). It's a bit more boring perhaps, but it turns out that just because something is made for a particular purpose, that doesn't make it the best for that purpose. Glibc also [uses generic instructions](https://code.woboq.org/userspace/glibc/sysdeps/x86\_64/multiarch/strlen-avx2.S.html) instead of the SSE 4.2 special stuff.
In the end I didn't have to worry about it. I found out how optimized strpbrk() is and I'm using it over a memory mapped file. I'm searching through JSON picking out fields about 560MB/s now :-D That's satisfying enough, and more portable (memory mapped stuff isn't 100% but i have code for windows and i think either posix or linux so it works with both and falls back)
Real programmers use butterflies
-
Hmmm, Peter is an old codeproject member. About 15 years ago he wrote the fastest Mandelbrot/Julia rendering engine[^]. Best Wishes, -David Delaune
That is pretty interesting stuff! I used to be a fractal fanatic and spent a lot of time optimizing algorithms and investigating alternatives. Then I came across GPUs and CUDA and the search was over.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
-
Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)
cheers Chris Maunder
Too high level. Anyone remember:
C:\>debug
-D
0B06:0100 75 60 C6 46 00 00 8A 7E-04 F6 C7 04 74 E6 C6 46 u`.F...~....t..F
0B06:0110 00 02 8B 76 02 80 3C 00-74 4B B3 2E 34 00 F5 0A ...v..<.tK..4...
0B06:0120 B3 3A 38 5C FE 74 05 C6-46 00 01 4E 32 DB 86 1C .:8\.t..F..N2...
0B06:0130 E8 39 EB 3B D6 73 1B 56-51 8B CE 8B F2 AC E8 B2 .9.;.s.VQ.......
0B06:0140 E1 74 09 AC 3B F1 72 F5-59 5E EB 0B 3B F1 72 ED .t..;.r.Y^..;.r.
0B06:0150 59 5E 3A 5C FF 74 0E B4-3B CD 21 86 1C 73 95 E8 Y^:\.t..;.!..s..
0B06:0160 9B DA E9 C9 D7 E9 C3 D7-89 7E 02 80 46 01 0C B8 .........~..F...
0B06:0170 3F 2E B9 08 00 F3 AA 86-C4 AA 86 C4 B1 03 F3 AA ?...............Now, those were the (so-called) good old days. :)
If you can keep your head while those about you are losing theirs, perhaps you don't understand the situation.
-
Too high level. Anyone remember:
C:\>debug
-D
0B06:0100 75 60 C6 46 00 00 8A 7E-04 F6 C7 04 74 E6 C6 46 u`.F...~....t..F
0B06:0110 00 02 8B 76 02 80 3C 00-74 4B B3 2E 34 00 F5 0A ...v..<.tK..4...
0B06:0120 B3 3A 38 5C FE 74 05 C6-46 00 01 4E 32 DB 86 1C .:8\.t..F..N2...
0B06:0130 E8 39 EB 3B D6 73 1B 56-51 8B CE 8B F2 AC E8 B2 .9.;.s.VQ.......
0B06:0140 E1 74 09 AC 3B F1 72 F5-59 5E EB 0B 3B F1 72 ED .t..;.r.Y^..;.r.
0B06:0150 59 5E 3A 5C FF 74 0E B4-3B CD 21 86 1C 73 95 E8 Y^:\.t..;.!..s..
0B06:0160 9B DA E9 C9 D7 E9 C3 D7-89 7E 02 80 46 01 0C B8 .........~..F...
0B06:0170 3F 2E B9 08 00 F3 AA 86-C4 AA 86 C4 B1 03 F3 AA ?...............Now, those were the (so-called) good old days. :)
If you can keep your head while those about you are losing theirs, perhaps you don't understand the situation.
Yes. That reminds me of when i learned 6502 bytecode before i realized i had a built in mini-assembler.
Real programmers use butterflies
-
Yes. That reminds me of when i learned 6502 bytecode before i realized i had a built in mini-assembler.
Real programmers use butterflies
The first assembler I used was simply adding symbols. The instruction set was very regular (the CPU architecture from the days long before microcode), so opcode, modifiers and offsets all had their fixed place in the instruction word. We played around with this: To generate a MUL (multiply) instruction, you could rather use ADD ADD, as the opcode for MUL was twice the opcode of ADD :-)
-
Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)
cheers Chris Maunder
If you consider one weekend enough... :omg:
Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger
-
Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed. And here you are. (I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)
cheers Chris Maunder
Chris Maunder wrote:
but have never actually bothered trying to learn a single instruction.
Allow me to get you started:
MOV Chris, Good_Book;
JMP ASM_PRO;"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
-
That is pretty interesting stuff! I used to be a fractal fanatic and spent a lot of time optimizing algorithms and investigating alternatives. Then I came across GPUs and CUDA and the search was over.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
You need to check out FRACTINT[^] - more fractal than even a fanatical fanatic can handle. It just seems to have more and more features. The Wikipedia link hardly touches the surface.
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
-
You need to check out FRACTINT[^] - more fractal than even a fanatical fanatic can handle. It just seems to have more and more features. The Wikipedia link hardly touches the surface.
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
Yes, I have it and it is quite good. I got lots of ideas from it. For the highest performing fractal program I have ever seen - check out the Mandelbrot sample that comes with the CUDA SDK. It calculates in real time. You can pan and zoom and updates are instantaneous. It is really fast.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
-
Chris Maunder wrote:
but have never actually bothered trying to learn a single instruction.
Allow me to get you started:
MOV Chris, Good_Book;
JMP ASM_PRO;"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
I have a short attention span. Does it contain pictures and large fonts?
cheers Chris Maunder
-
I have a short attention span. Does it contain pictures and large fonts?
cheers Chris Maunder
Does what?
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010