Who says you can't beat the compiler? [part 1 of 2]

_Damian S_

Impressive... but I call that someone needs to check around to see if there's a life somewhere they can grab!! :laugh:

I don't have ADHD, I have ADOS... Attention Deficit oooh SHINY!! If you like cars, check out the Booger Mobile blog | If you feel generous - make a donation to Camp Quality!!

John M Drescher

I would think it would not take a lot of craftiness to beat the .NET or a JVM with ASM.

John

Joe Woodbury

Of course you can beat the compiler, the question is whether it's worth the time and effort. (Of course if you were really fanatical, you'd write this in native assembly and load either the 32-bit or 64-bit [or ARM or whatever] DLL and use that.)

Lost User

That's sortof the point..

Henry Minute

Rama Krishna Vavilala wrote:

I have never heard anyone say that

That's not surprising. I googled the phrase and only got 3.75 pages back.

Henry Minute Do not read medical books! You could die of a misprint. - Mark Twain Girl: (staring) "Why do you need an icy cucumber?" “I want to report a fraud. The government is lying to us all.”

puromtec1

Is it just my perception, or is this actually the first legitimate technical article posted in the lounge? I give it a 5, btw.

leppie

harold aptroot wrote:

ASM: 127836 // anyone? what happened here?

Did you call Marshal.Prelink after getting the function pointer?

xacc.ide
IronScheme - 1.0 RC 1 - out now!
((λ (x) `(,x ',x)) '(λ (x) `(,x ',x))) The Scheme Programming Language – Fourth Edition

leppie

puromtec1 wrote:

first legitimate technical article posted in the lounge?

I suspect he couldn't find his blog :)

xacc.ide
IronScheme - 1.0 RC 1 - out now!
((λ (x) `(,x ',x)) '(λ (x) `(,x ',x))) The Scheme Programming Language – Fourth Edition

Rick York

That was similar to my thought also. It would be interesting to see the relative performance of native C++ and native assembly code on that snippet.

0x3c0

I enjoyed reading that - perhaps you could post it as a Tip and Trick?

OSDev :)

peterchen

TL;DR, articles go here[^]

Agh! Reality! My Archnemesis![^]
| FoldWithUs! | sighist | WhoIncludes - Analyzing C++ include file hierarchy

Mladen Jankovic

Yeah, but the real question is - can you beat Intel C++ compiler?

[Genetic Algorithm Library] [Wowd]

modified on Wednesday, September 8, 2010 2:26 AM

Lost User

harold aptroot wrote:

ASM: 127836 // anyone? what happened here?

Instructions written into memory generated by VirtualAlloc will most likely cause a L1/L2 cache miss. The extra clock cycles were probably spent utilizing the TLB to find the physical memory offset. You can try using the prefetchnta instruction to move the memory into L1 if you want to avoid the initial cache miss. Keep in mind that prefetchnta is only a hint and will sometimes be ignored under certain conditions. Best Wishes, -David Delaune

Daniel Grunwald

I haven't seen a compiler that can properly use SSE2 yet. Auto-vectorization often only works in trivial cases, in other cases packed instructions go unused. However, instead of dropping to assembler, you can to write C code using intrinsics - this way you only select the ASM instructions to use, and the compiler will pick the instruction ordering and register allocation for you. And unlike inline ASM code, intrinsics are portable between multiple C compilers and between x86 and x86-64. Also, your optimized code is not equivalent - it has a much lower floating point precision. RSQRTPS is (a lot) faster than SQRTPS, but has much less precision. You need to do an additional Newton-Raphson step to arrive somewhere close to normal float precision. Of course, the loss of precision may be acceptable in your case, but it's the reason compilers cannot do this optimization automatically. Moreover, the way your C# code is written, even the cvtss2sd/cvtsd2ss dance is mandatory for the compiler: you are dividing a double (1.0) by a double (Math.Sqrt result), so the compiler is not allowed to introduce additional rounding errors by rounding the intermediate result to float. You might have gotten more efficient code by writing

float invLen = 1.0f / (float)Math.Sqrt(f.x * f.x + f.y * f.y + f.z * f.z);

Giorgi Dalakishvili

Sorry about stupid question but where does the Asm class come from?

Giorgi Dalakishvili #region signature My Articles Browsing xkcd in a windows 7 way[^] #endregion

Cesar de Souza

I am also interested in the answer :)

http://crsouza.com

Lost User

No, thanks, I'll keep that in mind :)

Chris Trelawny Ross

Umm ...

harold aptroot wrote:

The Just In Time compiler didn't even do such a bad job here...

... the Just in Time compiler takes as its input the IL code that is generated by the C# compiler or, in the case of the IL assembly language you hand crafted, the output of the IL assembler. So, actually, the JIT compiler hasn't done anything yet. :-\

Lost User

I wrote it, the source is a bit long for here though since it includes a rudimentary assembler

Lost User · modified on Wednesday, September 8, 2010 2:26 AM

I doubt it, it's pretty smart..