sse2 intrinsics question?
-
Hi there, Is anybody here dealt with coding of integer image processing algorithms using SSE2 intrinsics on a P-4. If yes, please respond. I am stuck with a problem. wanted to know if there is a round way about it. ARK
What's the problem?
-
What's the problem?
Hi Chris, Thank you for responding. I am currently using SSE2 intrinsics to optimize my image processing algo. Let us say we have four different non contiguous addresses inside one single XMM register: XMM 0 = |ADD1 | Add2 | Add3 | Add4 | If I want to get the data at those addresses what is the best way possible. The only way I could think of writing it back to memory/cache and then read them with pointer indexing like XMM1 = |*ADD1 | *ADD2 |*ADD3 |*ADD4 | This would give a huge hit in performance since there is memory read and write back which is a lot of cycles per 4 indexed values.:confused:....Is there any way round it to get hold of those values from those addresses. Normal vector processing machines support Gather,Scatter which is the equivalent of getting data from non contiguous addresses. However, these kind of support seems to be absent in SSE2... Please respond if you have any thoughts about it. Anything is helpful. Thank you for the help. Best regards, Anand
-
Hi Chris, Thank you for responding. I am currently using SSE2 intrinsics to optimize my image processing algo. Let us say we have four different non contiguous addresses inside one single XMM register: XMM 0 = |ADD1 | Add2 | Add3 | Add4 | If I want to get the data at those addresses what is the best way possible. The only way I could think of writing it back to memory/cache and then read them with pointer indexing like XMM1 = |*ADD1 | *ADD2 |*ADD3 |*ADD4 | This would give a huge hit in performance since there is memory read and write back which is a lot of cycles per 4 indexed values.:confused:....Is there any way round it to get hold of those values from those addresses. Normal vector processing machines support Gather,Scatter which is the equivalent of getting data from non contiguous addresses. However, these kind of support seems to be absent in SSE2... Please respond if you have any thoughts about it. Anything is helpful. Thank you for the help. Best regards, Anand
As far as I know there's no method of dereferencing the contents of an XMM register so you're only solution will be to 'manually' extract the addresses, dereference them, and then 'manually' build your new XMM register - this is effectively what you're suggesting. Of course you can keep the addresses in registers to avoid the memory read/write hit you mention but it's still not a great solution. For example: movd eax, xmm0 //eax is now the address in the lowest 32 bits of xmm0 mov ebx, [eax] //ebx is whatever eax was pointing at movd xmm1, ebx //low dword of xmm1 = *(low dword of xmm0) repeating this with some packed rotations to get/set all the data in the xmm registers. (Sorry for the assembler - I don't actually use intrinsics) Perhaps you need to look at the overall algorithm to see whether there's a method which avoids ending up with your addresses in an XMM register... Not sure I've been much help really - if you find an elegant solution I'd be interested! Cheers, Chris.