Any one using Nvidia's CUDA

Mycroft Holmes

Stuart Dootson wrote:

my cousin used to be a SAS contractor

Made a lot of money and went insane.

Never underestimate the power of human stupidity RAH

Stuart Dootson

Ah, you know him...

Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

Rama Krishna Vavilala

No but I am planning to use OpenCL. I am not really sure how much they differ.

martin_hughes

At a guess, one's called CUDA and the other OpenCL.

print "http://www.codeproject.com".toURL().text Ain't that Groovy?

Lost User

OpenCL should work on ATIs as well

Lost User

It's used by Google in their search engines. I have an interesting post about GPU's and Amdahls's Law on Nvidia: How can CUDA break Amdahl's Law?[^] If you are thinking of using CUDA, you might want to read the post first, it may or may not help your team make a decision to use CUDA. The truth about CUDA is revealed. ~TheArch :cool:

modified on Wednesday, July 22, 2009 10:44 PM

Lost User

If you can find me the CUDA Language Spec (Op Code) I will make a IL -> CUDA IL assemble linker for it. I looked around for it a few weeks ago, but could not find it. The alternitive it to use DirectML, but I have no idea when it's comming out?!?! ~TheArch

Lost User

DirectML(sp?) will do it also. A mathematics lib for in the next version of DirectX.

Lost User

Like this? PTX specs[^] Sadly it doesn't seem to specify the binary encoding..

Daniel Grunwald

Something like 'Brahma'[^]?

Lost User

Yea I tried that some time ago, but while it is nice it only works on 32bit mode and/because it uses DirectX. It's convenient, but the performance is not great, and in 64bit mode it just dies because the DirectX dll's die (MS's fault) which makes it impossible to use it in a plugin for Paint.NET without making it multi-process but that would completely kill the performance. And it seems a bit abandoned.

Lost User

Good enough for goverment work. I will start in the am. ~TheArch

Lost User

Daniel Grunwald wrote:

Something like 'Brahma'[^]?

Kind of. There is also a .NET plug in for CUDA. You don't have to wite C code to get to the GPU. I have not tested it though. It can be found on the Nvidia CUDA applications portal.

Lost User

Sweet, seems a bit large for a 1 man project though, need any help? :)

Lost User

:thumbsup:

harold aptroot wrote:

Sweet, seems a bit large for a 1 man project though, need any help?

That would be great. Pick your flavor! Prototype AutoParalizer Archiecture:

1. Tool to parse the target .NET IL. (Reflect onto IL using Relflection.Emit)
2. Assembly Table Linker (HashTable to translate the IL into PSX)
3. CUDA integration:
a. Generate parallel mini functions in CUDA from #2
b. Markup .NET IL with 3.b
4. Recompile everyithing 3.a & 3.b

'Did I miss something?' ~TheArch

Lost User

Hm I dunno, but some observations: - MSIL uses a read-only* stack so essentially it's equivalent to SSA (source: cr88192) - Due to that there is no direct mapping of MSIL onto PTX, at the very least you need register allocation, but since it's SSA the interference graph will be chordal, so graph colouring has a polynomial running time (so optimal register usage isn't even hard) - local/shared/shared-at-other-level/etc could be a problem, defaulting to global-ish is extremely slow but determining where it should go is probably hard (sounds like escape analysis to me, which is hard) - Different kinds of loads/stores, might take some additional analysis to figure out which one to use (mov vs ld vs tex etc) - There is no GC in CUDA - but classes wouldn't be all that useful there anyway, seems ok to me to limit it to structs - There is probably more to this than might be seen at a first glance.. - ???? - Profit * ok let me clarify, I don't really mean that it doesn't write to the stack - I mean that it doesn't overwrite things deep down in the stack, it just pushes thing onto it. Well except in some rare and creepy cases anyway.

Lost User

harold aptroot wrote:

- MSIL uses a read-only stack so essentially it's equivalent to SSA (source: cr88192)

Hmm, okay I'll take your word on it. But the emited IL won't be read only after decomposing it and saving to a new file.

harold aptroot wrote:

- Due to that there is no direct mapping of MSIL onto PTX, at the very least you need register allocation, but since it's SSA the interference graph will be chordal, so graph colouring has a polynomial running time (so optimal register usage isn't even hard)

Yeah, I though as much. From my first 20 sec glance at it, I think there are about 45% direct, the others will have to use some diffrent translation logic.

harold aptroot wrote:

- local/shared/shared-at-other-level/etc could be a problem, defaulting to global-ish is extremely slow but determining where it should go is probably hard (sounds like escape analysis to me, which is hard)

Hmm, I don't know much about this I will have to research it in the morning.

harold aptroot wrote:

- Different kinds of loads/stores, might take some additional analysis to figure out which one to use (mov vs ld vs tex etc)

Yeah, this is similar to #2 on your list. The I invision the translation lib will have custom functions in PSX. ie. we run into 'String s = new String("something")' translate to PSX function 'CUDA.String s = new CUDA.String((CUDA.String)"something")'.

harold aptroot wrote:

- There is no GC in CUDA - but classes wouldn't be all that useful there anyway, seems ok to me to limit it to structs

Correct I think?!?!? This is where it becomes very important to correctly use the shared 16k memory space. We can clean it up on the .NET side.

harold aptroot wrote:

- There is probably more to this than might be seen at a first glance..

Yeah, google LABVIEW. This will break down most of our barriers.

harold aptroot wrote:

- Profit

I sugest a good prototype here on The Code Project. Then after we get some interest and more help a professional version with better features. We won't handicap the prototype, but the pro version would have many performance enhancments and

Lost User

Harold wrote:

- ???? - Profit

Never seen that one before? :) On the SSA/MSIL thing - it's the operand stack that would be read-only, making the opcodes "implicit SSA" as I'll call it now (because the operands are not listed but inferred from the stack, and that also ensures that it really is SSA form - albeit implicit) Ok I saw LABVIEW, what am I supposed to see though? *bookmarks project page*

Lost User

Let's move this discussion to the project thread. We will have a little more privacy, and once the article goes public the thread will be erased.

kmg365 · modified on Wednesday, July 22, 2009 10:44 PM

Thanks! and thanks for all your input good thread, very helpful.