Two basic questions about generated assembly
-
Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George
-
Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George
George_George wrote:
why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction."
lea seems to be an addressing instruction, which means it probably must execute in a single cycle and therefore would surely be faster than mul. On the other side, because it is an addressing instruction, think if you will be able to multiply very large numbers this way. There must be surely limitations on that. mul can work with large numbers also. :) Thanks for the question, the search for an answer got me an interesting read. :) Link: Wikibooks->Reverse Engineering->CodeTransformations->Common instruction substitutions[^] Warning: I'm not a full time assembly programmer and I may not be accurate. My assumptions: An X86 Processor, pentium class.
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
George_George wrote:
why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction."
lea seems to be an addressing instruction, which means it probably must execute in a single cycle and therefore would surely be faster than mul. On the other side, because it is an addressing instruction, think if you will be able to multiply very large numbers this way. There must be surely limitations on that. mul can work with large numbers also. :) Thanks for the question, the search for an answer got me an interesting read. :) Link: Wikibooks->Reverse Engineering->CodeTransformations->Common instruction substitutions[^] Warning: I'm not a full time assembly programmer and I may not be accurate. My assumptions: An X86 Processor, pentium class.
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
Thanks Rajesh! 1. "it probably must execute in a single cycle and therefore would surely be faster than mul" -- do you have any documents to support this statement? Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. Any ideas to my question #2? :-) regards, George
-
Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George
George_George wrote:
why using LEA to do multiplication is faster than using MUL?
You'd have to know about the internal architecture and circuitry of the CPU to answer that; I don't and I doubt there would be many people except for people that work (or have worked) at Intel that would.
George_George wrote:
"The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?
To understand what's going on here you have to know a little about Intel CPUs and segment registers. Basically C/C++ has no concept of segment registers and such (it assumes a linear address space) so this is a page-table mapping trick done by the OS to make the TEB addressable in such an environment.
Steve
-
George_George wrote:
why using LEA to do multiplication is faster than using MUL?
You'd have to know about the internal architecture and circuitry of the CPU to answer that; I don't and I doubt there would be many people except for people that work (or have worked) at Intel that would.
George_George wrote:
"The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?
To understand what's going on here you have to know a little about Intel CPUs and segment registers. Basically C/C++ has no concept of segment registers and such (it assumes a linear address space) so this is a page-table mapping trick done by the OS to make the TEB addressable in such an environment.
Steve
Thanks Steve, 1. Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++? regards, George
-
Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George
1. From the same article, below your quoted sentence. The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true. That means
LEA
instruction is faster thanMUL
only for a small set of multipliers's value. 2.George_George wrote:
The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other?
I think it means direct address, i.e.
mov eax,dword ptr fs:[00000018h]
load
eax
with the address ofTEB
, hence the following instructionmov eax,dword ptr [eax+24h]
loads
eax
with value found at offset 0x24 int theTEB
(the ThreadID
).George_George wrote:
What means non-linear address?
I suppose it is indirect addressing (via
FS
register in this context). :)If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles] -
George_George wrote:
why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction."
lea seems to be an addressing instruction, which means it probably must execute in a single cycle and therefore would surely be faster than mul. On the other side, because it is an addressing instruction, think if you will be able to multiply very large numbers this way. There must be surely limitations on that. mul can work with large numbers also. :) Thanks for the question, the search for an answer got me an interesting read. :) Link: Wikibooks->Reverse Engineering->CodeTransformations->Common instruction substitutions[^] Warning: I'm not a full time assembly programmer and I may not be accurate. My assumptions: An X86 Processor, pentium class.
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
I'll be happy to hear your feedback on "why" you voted the post down. Correct me if I said something wrong there, please. :)
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
Thanks Steve, 1. Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. "it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++? regards, George
George_George wrote:
Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?
Download the datasheet for the CPU.
George_George wrote:
"it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?
It's complicated and very low level. I suggest you start reading something like this[^].
Steve
-
1. From the same article, below your quoted sentence. The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true. That means
LEA
instruction is faster thanMUL
only for a small set of multipliers's value. 2.George_George wrote:
The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other?
I think it means direct address, i.e.
mov eax,dword ptr fs:[00000018h]
load
eax
with the address ofTEB
, hence the following instructionmov eax,dword ptr [eax+24h]
loads
eax
with value found at offset 0x24 int theTEB
(the ThreadID
).George_George wrote:
What means non-linear address?
I suppose it is indirect addressing (via
FS
register in this context). :)If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler. -- Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong. -- Iain Clarke
[My articles]Thanks CPallini, 1. What means "hardwired address generation tables"? Show more descriptions please? 2. I am confused about your two assembly instruction because I think direct access mean not using pointer, i.e. [] operator in assembly language. But you are using [] in both assembly language statements and I think the two samples given should be both indirect/pointer/non-linear accessing. Please feel free to correct me if I am wrong. :-) regards, George
-
I'll be happy to hear your feedback on "why" you voted the post down. Correct me if I said something wrong there, please. :)
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
What means vote post down? I vote for 5 because I like your answer. You want 6? :-) regards, George
-
What means vote post down? I vote for 5 because I like your answer. You want 6? :-) regards, George
Nope, I knew it wouldn't be you. Someone has marked it as an unhelpful answer and I wanted their feedback in particular, so that I can know if I my answer was wrong or how can I make it any better. :)
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
Hello everyone, Two questions after readnig this article, http://www.microsoft.com/msj/0298/hood0298.aspx 1. why using LEA to do multiplication is faster than using MUL? "Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction." 2. "The TEB's linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address? thanks in advance, George
Some more read material: Pentium Optimization Cross-Reference[^]. From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe. Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
Nope, I knew it wouldn't be you. Someone has marked it as an unhelpful answer and I wanted their feedback in particular, so that I can know if I my answer was wrong or how can I make it any better. :)
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
Take it easy, man! :-) Rajesh, just enjoy discussion with people around the world which have different options for the same question. regards, George
-
Take it easy, man! :-) Rajesh, just enjoy discussion with people around the world which have different options for the same question. regards, George
The point was that there was nothing wrong in my answer (my opinion) and someone still voted it down. I just wanted their feedback in particular to know if anything was wrong with my answer, so that I can know something new. Not that I care for the vote. :)
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
Thanks CPallini, 1. What means "hardwired address generation tables"? Show more descriptions please? 2. I am confused about your two assembly instruction because I think direct access mean not using pointer, i.e. [] operator in assembly language. But you are using [] in both assembly language statements and I think the two samples given should be both indirect/pointer/non-linear accessing. Please feel free to correct me if I am wrong. :-) regards, George
George_George wrote:
"hardwired address generation tables"
It's referring the electronics in the CPU.
Steve
-
Some more read material: Pentium Optimization Cross-Reference[^]. From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe. Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
Thanks Rajesh, I read through the related section you referred. Really good. 1. MUL is implemented through SHL? My previous thought is CPU has an individual multiplication implementation, especially for values to multiply are not power of 2 (e.g. * 3); 2. Why when multiplying "2, 3, 4, 5, 7, 8, 9" are better to use LEA? How do we get such numbers? regards, George
-
Thanks Rajesh! 1. "it probably must execute in a single cycle and therefore would surely be faster than mul" -- do you have any documents to support this statement? Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)? 2. Any ideas to my question #2? :-) regards, George
George_George wrote:
"it probably must execute in a single cycle and therefore would surely be faster than mul" -- do you have any documents to support this statement?
With the LEA instruction, the x86 processor can now perform a 3-number add, with something like a C expression "a = b + c + 10;" translating into EAX = EBX+ECX+10 and being coded into one instruction: LEA EAX,[EBX+ECX+10] Notice that no memory is actually referenced. LEA is used merely to calculate values by performing the addition of a base register (EBX) with an index register (ECX) with some constant displacement (10). This is what the address generation unit (AGU) does, allowing the processor to quickly calculate addresses of array elements, screen pixel locations, and do some basic arithmetic in one clock cycle. Source: http://www.emulators.com/docs/pentium_1.htm[^] Refer to pages 6 & 7 in this PDF (Pentium: Not the same old song)[^]. This too supports my earlier statement. Also, read "Handy info on speeding up integer instructions" in this page[^]
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]
-
George_George wrote:
"hardwired address generation tables"
It's referring the electronics in the CPU.
Steve
Thanks Steve, I am a little confused. MUL is based on shifting operation, but as mentioned, it is slower than LEA for some special operations, e.g. * 2, *3, *5, etc. How LEA is implemented and why it is faster than MUL? regards, George
-
George_George wrote:
Where to look for documents for cycles needed for a specific instruction (e.g. LEA and MUL)?
Download the datasheet for the CPU.
George_George wrote:
"it assumes a linear address space" -- it you mean low layer CPU/register or high layer C/C++?
It's complicated and very low level. I suggest you start reading something like this[^].
Steve
Thanks Steve, "datasheet for the CPU" -- could you give me some links or keywords to search? I am new to this topic. :-) regards, George
-
Thanks Steve, "datasheet for the CPU" -- could you give me some links or keywords to search? I am new to this topic. :-) regards, George
Have you Googled for "Pentium Datasheet"?
Nobody can give you wiser advice than yourself. - Cicero .·´¯`·->Rajesh<-·´¯`·. Microsoft MVP - Visual C++[^]