Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to map a matrix index to a memory location

How to map a matrix index to a memory location

Scheduled Pinned Locked Moved C / C++ / MFC
questioncssdatabaseperformancehelp
5 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Leif Simon Goodwin
    wrote on last edited by
    #1

    + y; Thus lookup is the offset to the start of each row in memory. So, my question is how can we do this as fast and simply as possible? Is there a quicker way?

    J L D 3 Replies Last reply
    0
    • L Leif Simon Goodwin

      + y; Thus lookup is the offset to the start of each row in memory. So, my question is how can we do this as fast and simply as possible? Is there a quicker way?

      J Offline
      J Offline
      Jochen Arndt
      wrote on last edited by
      #2

      The performance of integer multiplications depend on the used CPU architecture and generation. With most x86 and compatible CPUs integer multiplications are 2 - 4 times slower than additions which can be usually ignored. Using a lookup table will be then not significantly faster or even slower. However, with other (mainly RISC) CPU's, the operation can be up to 40 times slower than additions.

      Quote:

      The quickest way I can think of is to use a look up table

      The quickest way is using a shift operation because that is executed on registers while a lookup table requires loading from memory. Just compare the methods (in pseudo code):

      ; Classic method
      load r1, x
      mul r1, total_number_of_columns
      add r1, y

      ; Shift
      load r1, x
      shl r1, shift_count
      add r1, y

      ; Lookup table
      load r1, x
      ; Assuming item size is a power of 2
      ; Otherwise a multiplication is used here!
      shl r1, sizeof(item)
      ; EDIT: Must be of course according to the table item size
      shl r1, TABLE_ITEM_SIZE_SHIFT
      load r1, [lookup_table + r1]
      add r1, y

      Note that some of the above operations might require two assembly instructions (especially on RISC CPU's) when operations can be only performed on registers and not with memory addresses.

      L 1 Reply Last reply
      0
      • J Jochen Arndt

        The performance of integer multiplications depend on the used CPU architecture and generation. With most x86 and compatible CPUs integer multiplications are 2 - 4 times slower than additions which can be usually ignored. Using a lookup table will be then not significantly faster or even slower. However, with other (mainly RISC) CPU's, the operation can be up to 40 times slower than additions.

        Quote:

        The quickest way I can think of is to use a look up table

        The quickest way is using a shift operation because that is executed on registers while a lookup table requires loading from memory. Just compare the methods (in pseudo code):

        ; Classic method
        load r1, x
        mul r1, total_number_of_columns
        add r1, y

        ; Shift
        load r1, x
        shl r1, shift_count
        add r1, y

        ; Lookup table
        load r1, x
        ; Assuming item size is a power of 2
        ; Otherwise a multiplication is used here!
        shl r1, sizeof(item)
        ; EDIT: Must be of course according to the table item size
        shl r1, TABLE_ITEM_SIZE_SHIFT
        load r1, [lookup_table + r1]
        add r1, y

        Note that some of the above operations might require two assembly instructions (especially on RISC CPU's) when operations can be only performed on registers and not with memory addresses.

        L Offline
        L Offline
        Leif Simon Goodwin
        wrote on last edited by
        #3

        Thank you.

        1 Reply Last reply
        0
        • L Leif Simon Goodwin

          + y; Thus lookup is the offset to the start of each row in memory. So, my question is how can we do this as fast and simply as possible? Is there a quicker way?

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          There may be, depending on the processor and number of columns and size of elements. Some processors give fast access to separate bytes of a register (or in the first place form a larger register by adjoining several smaller registers), then perhaps you can form an address almost for free by creating `y` in the low byte and `x` in the high byte of the address (likely with an offset otherwise the matrix has to start at 0). This applies to z80 and 8086 and probably some other old CISC architectures (not modern x86 though - it's still possible but no longer actually fast). On a RISC architecture that sort of trick usually doesn't exist. It also typically wastes a lot of memory (you can use the gaps of course but they're fragmented).

          1 Reply Last reply
          0
          • L Leif Simon Goodwin

            + y; Thus lookup is the offset to the start of each row in memory. So, my question is how can we do this as fast and simply as possible? Is there a quicker way?

            D Offline
            D Offline
            Daniel Pfeffer
            wrote on last edited by
            #5

            Before using such micro-optimisations, I would ensure that we have used all higher-level optimisations, first. For example: 1. How are the data received? Are they stored in memory in order of reception, or does some processing (e.g. address calculation) need to be performed? 2. How are the data processed? Is the access pattern sequential? random? If a single pass through the data is performed, is it possible to store the data (see question 1 above) in the order of processing, and thereby avoid all indexing? You may be able to think of other optimisations, based on your knowledge of the hardware and the problem.

            Ad astra - both ways!

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups