Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Two basic questions about generated assembly

Two basic questions about generated assembly

Scheduled Pinned Locked Moved C / C++ / MFC
comdata-structuresquestion
35 Posts 7 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G George_George

    Thanks Steve, 1. Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++? regards, George

    S Offline
    S Offline
    Stephen Hewitt
    wrote on last edited by
    #8

    George_George wrote:

    Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?

    Download the datasheet for the CPU.

    George_George wrote:

    "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?

    It's complicated and very low level. I suggest you start reading something like this[^].

    Steve

    G 2 Replies Last reply
    0
    • CPalliniC CPallini

      1. From the same article, below your quoted sentence. The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true. That means LEA instruction is faster than MUL only for a small set of multipliers's value. 2.

      George_George wrote:

      The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other?

      I think it means direct address, i.e.

      mov eax,dword ptr fs:[00000018h]

      load eax with the address of TEB, hence the following instruction

      mov eax,dword ptr [eax+24h]

      loads eax with value found at offset 0x24 int the TEB (the Thread ID).

      George_George wrote:

      What means non-linear address?

      I suppose it is indirect addressing (via FS register in this context). :)

      If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
      This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
      [My articles]

      G Offline
      G Offline
      George_George
      wrote on last edited by
      #9

      Thanks CPallini, 1. What means "hardwired address generation tables"? Show more descriptions please? 2. I am confused about your two assembly instruction because I think direct access mean not using pointer, i.e. [] operator in assembly language. But you are using [] in both assembly language statements and I think the two samples given should be both indirect/pointer/non-linear accessing. Please feel free to correct me if I am wrong. :-) regards, George

      S 1 Reply Last reply
      0
      • R Rajesh R Subramanian

        I'll be happy to hear your feedback on "why" you voted the post down. Correct me if I said something wrong there, please. :)

        Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

        G Offline
        G Offline
        George_George
        wrote on last edited by
        #10

        What means vote post down? I vote for 5 because I like your answer. You want 6? :-) regards, George

        R 1 Reply Last reply
        0
        • G George_George

          What means vote post down? I vote for 5 because I like your answer. You want 6? :-) regards, George

          R Offline
          R Offline
          Rajesh R Subramanian
          wrote on last edited by
          #11

          Nope, I knew it wouldn't be you. Someone has marked it as an unhelpful answer and I wanted their feedback in particular, so that I can know if I my answer was wrong or how can I make it any better. :)

          Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

          G 1 Reply Last reply
          0
          • G George_George

            Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George

            R Offline
            R Offline
            Rajesh R Subramanian
            wrote on last edited by
            #12

            Some more read material: Pentium Optimization Cross-Reference[^]. From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe. Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.

            Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

            G 1 Reply Last reply
            0
            • R Rajesh R Subramanian

              Nope, I knew it wouldn't be you. Someone has marked it as an unhelpful answer and I wanted their feedback in particular, so that I can know if I my answer was wrong or how can I make it any better. :)

              Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

              G Offline
              G Offline
              George_George
              wrote on last edited by
              #13

              Take it easy, man! :-) Rajesh, just enjoy discussion with people around the world which have different options for the same question. regards, George

              R 1 Reply Last reply
              0
              • G George_George

                Take it easy, man! :-) Rajesh, just enjoy discussion with people around the world which have different options for the same question. regards, George

                R Offline
                R Offline
                Rajesh R Subramanian
                wrote on last edited by
                #14

                The point was that there was nothing wrong in my answer (my opinion) and someone still voted it down. I just wanted their feedback in particular to know if anything was wrong with my answer, so that I can know something new. Not that I care for the vote. :)

                Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                1 Reply Last reply
                0
                • R Rajesh R Subramanian

                  Some more read material: Pentium Optimization Cross-Reference[^]. From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe. Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.

                  Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                  G Offline
                  G Offline
                  George_George
                  wrote on last edited by
                  #15

                  Thanks Rajesh, I read through the related section you referred. Really good. 1. MUL is implemented through SHL? My previous thought is CPU has an individual multiplication implementation, especially for values to multiply are not power of 2 (e.g. * 3); 2. Why when multiplying "2, 3, 4, 5, 7, 8, 9" are better to use LEA? How do we get such numbers? regards, George

                  1 Reply Last reply
                  0
                  • G George_George

                    Thanks CPallini, 1. What means "hardwired address generation tables"? Show more descriptions please? 2. I am confused about your two assembly instruction because I think direct access mean not using pointer, i.e. [] operator in assembly language. But you are using [] in both assembly language statements and I think the two samples given should be both indirect/pointer/non-linear accessing. Please feel free to correct me if I am wrong. :-) regards, George

                    S Offline
                    S Offline
                    Stephen Hewitt
                    wrote on last edited by
                    #16

                    George_George wrote:

                    "hardwired address generation tables"

                    It's referring the electronics in the CPU.

                    Steve

                    G 1 Reply Last reply
                    0
                    • G George_George

                      Thanks Rajesh! 1. "it probably must execute in a single cycle and therefore would surely be faster than mul" -- do you have any documents to support this statement? Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. Any ideas to my question #2? :-) regards, George

                      R Offline
                      R Offline
                      Rajesh R Subramanian
                      wrote on last edited by
                      #17

                      George_George wrote:

                      "it probably must execute in a single cycle and therefore would surely be faster than mul" -- do you have any documents to support this statement?

                      With the LEA instruction, the x86 processor can now perform a 3-number add, with something like a C expression "a = b + c + 10;" translating into EAX = EBX+ECX+10 and being coded into one instruction: LEA EAX,[EBX+ECX+10] Notice that no memory is actually referenced. LEA is used merely to calculate values by performing the addition of a base register (EBX) with an index register (ECX) with some constant displacement (10). This is what the address generation unit (AGU) does, allowing the processor to quickly calculate addresses of array elements, screen pixel locations, and do some basic arithmetic in one clock cycle. Source: http://www.emulators.com/docs/pentium_1.htm[^] Refer to pages 6 & 7 in this PDF (Pentium: Not the same old song)[^]. This too supports my earlier statement. Also, read "Handy info on speeding up integer instructions" in this page[^]

                      Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                      G 1 Reply Last reply
                      0
                      • S Stephen Hewitt

                        George_George wrote:

                        "hardwired address generation tables"

                        It's referring the electronics in the CPU.

                        Steve

                        G Offline
                        G Offline
                        George_George
                        wrote on last edited by
                        #18

                        Thanks Steve, I am a little confused. MUL is based on shifting operation, but as mentioned, it is slower than LEA for some special operations, e.g. * 2, *3, *5, etc. How LEA is implemented and why it is faster than MUL? regards, George

                        R 1 Reply Last reply
                        0
                        • S Stephen Hewitt

                          George_George wrote:

                          Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?

                          Download the datasheet for the CPU.

                          George_George wrote:

                          "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?

                          It's complicated and very low level. I suggest you start reading something like this[^].

                          Steve

                          G Offline
                          G Offline
                          George_George
                          wrote on last edited by
                          #19

                          Thanks Steve, "datasheet for the CPU" -- could you give me some links or keywords to search? I am new to this topic. :-) regards, George

                          R 1 Reply Last reply
                          0
                          • G George_George

                            Thanks Steve, "datasheet for the CPU" -- could you give me some links or keywords to search? I am new to this topic. :-) regards, George

                            R Offline
                            R Offline
                            Rajesh R Subramanian
                            wrote on last edited by
                            #20

                            Have you Googled for "Pentium Datasheet"?

                            Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                            G 1 Reply Last reply
                            0
                            • G George_George

                              Thanks Steve, I am a little confused. MUL is based on shifting operation, but as mentioned, it is slower than LEA for some special operations, e.g. * 2, *3, *5, etc. How LEA is implemented and why it is faster than MUL? regards, George

                              R Offline
                              R Offline
                              Rajesh R Subramanian
                              wrote on last edited by
                              #21

                              Carlo's first reply to you was very clear: "The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9)." LEA can operate at hardware level and thus is significantly faster than MUL. But this applies only for a small set of numbers.

                              Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                              G 1 Reply Last reply
                              0
                              • G George_George

                                Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George

                                C Offline
                                C Offline
                                cp9876
                                wrote on last edited by
                                #22

                                You need to understand how the effective address is calculated in x86 CPUs. I'm not an expert, but from here[^] The offset part of the memory address can be specified either directly as a static value (called a displacement) or through an address computation made up of one or more of the following components: Displacement—An 8-, 16-, or 32-bit value. Base—The value in a general-purpose register. Index—The value in a general-purpose register except EBP. Scale factor—A value of 2, 4, or 8 that is multiplied by the index value. An effective address is computed by: Offset = Base + (Index * Scale) + displacement So the address generation logic provides a rapid way to compute B + I*S + D, (and this has to be very fast as it is done before the memory access) all the article was saying was that the LEA instruction allows the programmer to access the result of this instruction that clearly executes in one cycle, which back when the article was written, was presumably faster than using the integer ALU. Note that as the Scale factor is restricted to being 1,2,4 or 8 there are limited equations that can be calculated. To multiply by 5 you would calculate base + 4*base etc.. It may not be relevant today.

                                Peter "Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."

                                G 1 Reply Last reply
                                0
                                • R Rajesh R Subramanian

                                  Have you Googled for "Pentium Datasheet"?

                                  Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                                  G Offline
                                  G Offline
                                  George_George
                                  wrote on last edited by
                                  #23

                                  Thanks, Rajesh! I have found some from Intel website. http://www.intel.com/design/Pentium4/documentation.htm#datasheets[^] But, which one most likely contains the information of CPU cycles for a specific assembly instruction? I am new to this area and too long document list makes me confused. Any ideas? regards, George

                                  1 Reply Last reply
                                  0
                                  • R Rajesh R Subramanian

                                    Carlo's first reply to you was very clear: "The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9)." LEA can operate at hardware level and thus is significantly faster than MUL. But this applies only for a small set of numbers.

                                    Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]

                                    G Offline
                                    G Offline
                                    George_George
                                    wrote on last edited by
                                    #24

                                    Sorry for my bad English, Rajesh! My confusion is, LEA mean load effective address, and the purpose of this instruction is to get the value of the address, and so how could it be used for the purpose of multiplication? regards, George

                                    J 1 Reply Last reply
                                    0
                                    • G George_George

                                      Sorry for my bad English, Rajesh! My confusion is, LEA mean load effective address, and the purpose of this instruction is to get the value of the address, and so how could it be used for the purpose of multiplication? regards, George

                                      J Offline
                                      J Offline
                                      jhwurmbach
                                      wrote on last edited by
                                      #25

                                      Yes, LEA means "Load effective address". But lead can not only load an address, it can compute it on the fly. LEA EAX,[ESP+14] puts the result of (value of ESP) plus 14 into EAX. And it can compute more complicated calculations: LEA EAX,[EAX*4+EAX] works, and as I got from the text you have given, it does not actually use a calculation (involving things like cache access and multiply units). Instead, it uses a table lookup. But that does only work for for some multipliers. So, in effect, for certain multipliers, LEA is faster than MUL is.

                                      Let's think the unthinkable, let's do the undoable, let's prepare to grapple with the ineffable itself, and see if we may not eff it after all.
                                      Douglas Adams, "Dirk Gently's Holistic Detective Agency"

                                      G 1 Reply Last reply
                                      0
                                      • S Stephen Hewitt

                                        George_George wrote:

                                        Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?

                                        Download the datasheet for the CPU.

                                        George_George wrote:

                                        "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?

                                        It's complicated and very low level. I suggest you start reading something like this[^].

                                        Steve

                                        G Offline
                                        G Offline
                                        George_George
                                        wrote on last edited by
                                        #26

                                        Thanks Steve, 1. I have read the link you referred. It talks about how protected mode is using segment based accessment model. But during the whole article, it never mentioned what means linear and non-lnear -- and this is my question. 2. Could you provide a link for the data sheet you means please? Sorry I am new to this area. regards, George

                                        S 1 Reply Last reply
                                        0
                                        • C cp9876

                                          You need to understand how the effective address is calculated in x86 CPUs. I'm not an expert, but from here[^] The offset part of the memory address can be specified either directly as a static value (called a displacement) or through an address computation made up of one or more of the following components: Displacement—An 8-, 16-, or 32-bit value. Base—The value in a general-purpose register. Index—The value in a general-purpose register except EBP. Scale factor—A value of 2, 4, or 8 that is multiplied by the index value. An effective address is computed by: Offset = Base + (Index * Scale) + displacement So the address generation logic provides a rapid way to compute B + I*S + D, (and this has to be very fast as it is done before the memory access) all the article was saying was that the LEA instruction allows the programmer to access the result of this instruction that clearly executes in one cycle, which back when the article was written, was presumably faster than using the integer ALU. Note that as the Scale factor is restricted to being 1,2,4 or 8 there are limited equations that can be calculated. To multiply by 5 you would calculate base + 4*base etc.. It may not be relevant today.

                                          Peter "Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."

                                          G Offline
                                          G Offline
                                          George_George
                                          wrote on last edited by
                                          #27

                                          Thanks Peter, I like the article and read through it last evening. Three more comments, 1. I found two places in the document are conflicting, they are - "Modern operating system and applications use the (unsegmented) memory model¾ all the segment registers are loaded with the same segment selector so that all memory references a program makes are to a single linear-address space." -- looks like segment is useless since unsegment model is used? - "The offset which results from adding these components is called an effective address of the selected segment. Each of these components can have either a positive or negative (2's complement) value, with the exception of the scaling factor." -- why still needs segment selector to calculate? Conflicting with last statement, which is unsegment model? 2. "Note that the value of the EIP may not match with the current instruction because of instruction prefetching. The only way to read the EIP is to execute a CALL instruction and then read the value of the return instruction pointer from the procedure stack." -- my confusion is, EIP is next instruction to execute, and why return address is the same as EIP? Are they related? 3. "you generally create segment selectors with assembler directives and symbols. The assembler and/or linker then creates the actual segment selectors associated with these directives and symbols." -- what does this mean? Does it mean all segment related instruction will be ignored or modified by linker before execution? regards, George

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups