Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. sse2 intrinsics question?

sse2 intrinsics question?

Scheduled Pinned Locked Moved C / C++ / MFC
helpquestion
4 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Anand RK
    wrote on last edited by
    #1

    Hi there, Is anybody here dealt with coding of integer image processing algorithms using SSE2 intrinsics on a P-4. If yes, please respond. I am stuck with a problem. wanted to know if there is a round way about it. ARK

    C 1 Reply Last reply
    0
    • A Anand RK

      Hi there, Is anybody here dealt with coding of integer image processing algorithms using SSE2 intrinsics on a P-4. If yes, please respond. I am stuck with a problem. wanted to know if there is a round way about it. ARK

      C Offline
      C Offline
      Christopher Lloyd
      wrote on last edited by
      #2

      What's the problem?

      A 1 Reply Last reply
      0
      • C Christopher Lloyd

        What's the problem?

        A Offline
        A Offline
        Anand RK
        wrote on last edited by
        #3

        Hi Chris, Thank you for responding. I am currently using SSE2 intrinsics to optimize my image processing algo. Let us say we have four different non contiguous addresses inside one single XMM register: XMM 0 = |ADD1 | Add2 | Add3 | Add4 | If I want to get the data at those addresses what is the best way possible. The only way I could think of writing it back to memory/cache and then read them with pointer indexing like XMM1 = |*ADD1 | *ADD2 |*ADD3 |*ADD4 | This would give a huge hit in performance since there is memory read and write back which is a lot of cycles per 4 indexed values.:confused:....Is there any way round it to get hold of those values from those addresses. Normal vector processing machines support Gather,Scatter which is the equivalent of getting data from non contiguous addresses. However, these kind of support seems to be absent in SSE2... Please respond if you have any thoughts about it. Anything is helpful. Thank you for the help. Best regards, Anand

        C 1 Reply Last reply
        0
        • A Anand RK

          Hi Chris, Thank you for responding. I am currently using SSE2 intrinsics to optimize my image processing algo. Let us say we have four different non contiguous addresses inside one single XMM register: XMM 0 = |ADD1 | Add2 | Add3 | Add4 | If I want to get the data at those addresses what is the best way possible. The only way I could think of writing it back to memory/cache and then read them with pointer indexing like XMM1 = |*ADD1 | *ADD2 |*ADD3 |*ADD4 | This would give a huge hit in performance since there is memory read and write back which is a lot of cycles per 4 indexed values.:confused:....Is there any way round it to get hold of those values from those addresses. Normal vector processing machines support Gather,Scatter which is the equivalent of getting data from non contiguous addresses. However, these kind of support seems to be absent in SSE2... Please respond if you have any thoughts about it. Anything is helpful. Thank you for the help. Best regards, Anand

          C Offline
          C Offline
          Christopher Lloyd
          wrote on last edited by
          #4

          As far as I know there's no method of dereferencing the contents of an XMM register so you're only solution will be to 'manually' extract the addresses, dereference them, and then 'manually' build your new XMM register - this is effectively what you're suggesting. Of course you can keep the addresses in registers to avoid the memory read/write hit you mention but it's still not a great solution. For example: movd eax, xmm0 //eax is now the address in the lowest 32 bits of xmm0 mov ebx, [eax] //ebx is whatever eax was pointing at movd xmm1, ebx //low dword of xmm1 = *(low dword of xmm0) repeating this with some packed rotations to get/set all the data in the xmm registers. (Sorry for the assembler - I don't actually use intrinsics) Perhaps you need to look at the overall algorithm to see whether there's a method which avoids ending up with your addresses in an XMM register... Not sure I've been much help really - if you find an elegant solution I'd be interested! Cheers, Chris.

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups