Label adresses in inline assembly
-
Hello, I would like to emit the lable addresses from a bit of inline assembly code into a branch table but I could not figure out how to do it. I would say it should look something like this. At least the compiler emits something like this for a switch statement. I am using visual studio 2005. __asm { mov eax,BRANCHTABLE mov ecx,"Number between 0 and N" jmp [eax+ecx*4] BRANCHTABLE &LABLE0 &LABLE1 &LABLE2 ... &LABLEN LABLE0: ... LABLE1: ... LABLE2: ... ... LABLEN } Thanks ;)
-
Hello, I would like to emit the lable addresses from a bit of inline assembly code into a branch table but I could not figure out how to do it. I would say it should look something like this. At least the compiler emits something like this for a switch statement. I am using visual studio 2005. __asm { mov eax,BRANCHTABLE mov ecx,"Number between 0 and N" jmp [eax+ecx*4] BRANCHTABLE &LABLE0 &LABLE1 &LABLE2 ... &LABLEN LABLE0: ... LABLE1: ... LABLE2: ... ... LABLEN } Thanks ;)
In MASM, possibly - but in inline assembly, you can't use operators such as & and you can't declare initialised data (which is what you're doing). I really don't know how you'd go about that. Maybe replace hte branchtable by a switch statement with interspersed inline assembly?
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
-
Hello, I would like to emit the lable addresses from a bit of inline assembly code into a branch table but I could not figure out how to do it. I would say it should look something like this. At least the compiler emits something like this for a switch statement. I am using visual studio 2005. __asm { mov eax,BRANCHTABLE mov ecx,"Number between 0 and N" jmp [eax+ecx*4] BRANCHTABLE &LABLE0 &LABLE1 &LABLE2 ... &LABLEN LABLE0: ... LABLE1: ... LABLE2: ... ... LABLEN } Thanks ;)
Hi Remco, I created this little inline assembly sample for you. It loads the addresses of code labels into a jump table and allows you to jump to the offset based on a user choice. I commented each line.
#include "stdafx.h" int _tmain(int argc, _TCHAR* argv[]) { #pragma pack (1) unsigned long table[3] = {0}; #pragma pack(pop) char szError[] = "Invalid choice.\n"; char szPrompt[] = "Enter a number between 1 and 3:\n"; char szNotify[] = "You entered jump number: %d"; char szFmt[] = "%d"; int iChoice =0; int iSizeArray = (sizeof(table) / sizeof(table[0])); __asm { lea esi, table ;Load address of table into esi mov edx, DWORD PTR jump1 ;Move address of jump1 into edx mov [esi], edx ;Move edx into table[0] add esi, 4 ;Increment esi by size of unsigned long mov edx, DWORD PTR jump2 ;Move address of jump2 into edx mov [esi], edx ;Move edx into table[1] add esi, 4 ;Increment esi by size of unsigned long mov edx, DWORD PTR jump3 ;Move address of jump3 into edx mov [esi], edx ;Move edx into table[2] lea eax, szPrompt ;Load effective address of prompt push eax ;push eax onto stack #ifdef _DLL ;Are we dynamically linked to C runtime? call DWORD PTR printf ;Call dynamic linked printf #else call printf ;Call static linked printf #endif add esp, 4 ;adjust stack pointer because we pushed eax lea eax, iChoice ;Load effective address of iChoice push eax ;Push it onto the stack lea ebx, szFmt ;Load effective address of fmt push ebx ;Push it on the stack #ifdef _DLL ;Are we dynamically linked to C runtime? call DWORD PTR scanf ;Call dynamic linked scanf #else call scanf ;Call static linked scanf #endif add esp, 8 ;adjust stack pointer because we pushed eax and ebx mov eax, iChoice ;Move iChoice value into eax mov ebx, iSizeArray ;Move the size of our array into ebx cmp eax, ebx ;compare ja error ;Jump to the error lable if iChoice is larger than table array dec eax ;Decrement iChoice by 1 because the table is zero based array lea esi, table ;Load address of table into esi mov eax,[esi + 4*eax] ;Move value of table array onto eax by calculating offset jmp eax ;Jump to address stored in eax jump1: mov eax, 1 ;Move number 1 into eax jmp notify ;Absolute jump to notify label jump2: mov eax, 2 ;Move number 1 into eax jmp notify ;Absolute jump to notify label jump3: mov eax, 3 ;Move number 1 i
-
Hi Remco, I created this little inline assembly sample for you. It loads the addresses of code labels into a jump table and allows you to jump to the offset based on a user choice. I commented each line.
#include "stdafx.h" int _tmain(int argc, _TCHAR* argv[]) { #pragma pack (1) unsigned long table[3] = {0}; #pragma pack(pop) char szError[] = "Invalid choice.\n"; char szPrompt[] = "Enter a number between 1 and 3:\n"; char szNotify[] = "You entered jump number: %d"; char szFmt[] = "%d"; int iChoice =0; int iSizeArray = (sizeof(table) / sizeof(table[0])); __asm { lea esi, table ;Load address of table into esi mov edx, DWORD PTR jump1 ;Move address of jump1 into edx mov [esi], edx ;Move edx into table[0] add esi, 4 ;Increment esi by size of unsigned long mov edx, DWORD PTR jump2 ;Move address of jump2 into edx mov [esi], edx ;Move edx into table[1] add esi, 4 ;Increment esi by size of unsigned long mov edx, DWORD PTR jump3 ;Move address of jump3 into edx mov [esi], edx ;Move edx into table[2] lea eax, szPrompt ;Load effective address of prompt push eax ;push eax onto stack #ifdef _DLL ;Are we dynamically linked to C runtime? call DWORD PTR printf ;Call dynamic linked printf #else call printf ;Call static linked printf #endif add esp, 4 ;adjust stack pointer because we pushed eax lea eax, iChoice ;Load effective address of iChoice push eax ;Push it onto the stack lea ebx, szFmt ;Load effective address of fmt push ebx ;Push it on the stack #ifdef _DLL ;Are we dynamically linked to C runtime? call DWORD PTR scanf ;Call dynamic linked scanf #else call scanf ;Call static linked scanf #endif add esp, 8 ;adjust stack pointer because we pushed eax and ebx mov eax, iChoice ;Move iChoice value into eax mov ebx, iSizeArray ;Move the size of our array into ebx cmp eax, ebx ;compare ja error ;Jump to the error lable if iChoice is larger than table array dec eax ;Decrement iChoice by 1 because the table is zero based array lea esi, table ;Load address of table into esi mov eax,[esi + 4*eax] ;Move value of table array onto eax by calculating offset jmp eax ;Jump to address stored in eax jump1: mov eax, 1 ;Move number 1 into eax jmp notify ;Absolute jump to notify label jump2: mov eax, 2 ;Move number 1 into eax jmp notify ;Absolute jump to notify label jump3: mov eax, 3 ;Move number 1 i
Oooh - nice. I'd wondered if there was a way to create the jump table as a C variable, but hadn't thought of initialising it in assembly code.
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
-
Hello, I would like to emit the lable addresses from a bit of inline assembly code into a branch table but I could not figure out how to do it. I would say it should look something like this. At least the compiler emits something like this for a switch statement. I am using visual studio 2005. __asm { mov eax,BRANCHTABLE mov ecx,"Number between 0 and N" jmp [eax+ecx*4] BRANCHTABLE &LABLE0 &LABLE1 &LABLE2 ... &LABLEN LABLE0: ... LABLE1: ... LABLE2: ... ... LABLEN } Thanks ;)
Hello Stuart and David, Thank you for your replies. Both solutions given are practical I think. I intended to use the branch table inside a time ciritcal part of my ray-tracer (ray vs axis aligned bounding box intersection test). I am building a stream ray-tracer using the sse ALU processing four rays in parallel. But the number of rays to process is not always a multiple of four to process the last 1,2 or 3 rays I thought to use a branch table. But I think I will make a funciton one for each case and use a switch statement ouside of the filter functions making the code easier to read and the c++ compiler does emit the code I would like. Thank you for your help!!
-
Hello Stuart and David, Thank you for your replies. Both solutions given are practical I think. I intended to use the branch table inside a time ciritcal part of my ray-tracer (ray vs axis aligned bounding box intersection test). I am building a stream ray-tracer using the sse ALU processing four rays in parallel. But the number of rays to process is not always a multiple of four to process the last 1,2 or 3 rays I thought to use a branch table. But I think I will make a funciton one for each case and use a switch statement ouside of the filter functions making the code easier to read and the c++ compiler does emit the code I would like. Thank you for your help!!
A lot of jumps, and certainly those through a jump table, introduce a hickup in the instruction flow, as they are not predictable at all; so I'd rather avoid them. assuming lots of rays I would take a different approach: if not a multiple of four, calculate one of the rays multiple times, e.g. duplicate the last ray one to three times so the number always is a multiple of 4. That probably will be simpler and may be faster. :)
Luc Pattyn
:badger: :jig: :badger:
Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.
:jig: :badger: :jig:
-
A lot of jumps, and certainly those through a jump table, introduce a hickup in the instruction flow, as they are not predictable at all; so I'd rather avoid them. assuming lots of rays I would take a different approach: if not a multiple of four, calculate one of the rays multiple times, e.g. duplicate the last ray one to three times so the number always is a multiple of 4. That probably will be simpler and may be faster. :)
Luc Pattyn
:badger: :jig: :badger:
Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.
:jig: :badger: :jig:
Yes conditional branches can be expensive causing pipeline stalls. But in my implementation using streams of rays there is only one hard to predict branch and that branch is only taken once per processed stream so it is not really that important. But a c++/assembly mix looks alittle bit messy. Thats why I removed the branch table from my inline assembly function to improve code readability at the expense of code size. Expanding the number of rays per stream to a multiple of four is for me not an option because rays are partitioned in place. Meaning that the filtered stream or output stream should be of the same length compared to the input stream. Thanks :)
-
Yes conditional branches can be expensive causing pipeline stalls. But in my implementation using streams of rays there is only one hard to predict branch and that branch is only taken once per processed stream so it is not really that important. But a c++/assembly mix looks alittle bit messy. Thats why I removed the branch table from my inline assembly function to improve code readability at the expense of code size. Expanding the number of rays per stream to a multiple of four is for me not an option because rays are partitioned in place. Meaning that the filtered stream or output stream should be of the same length compared to the input stream. Thanks :)
I forgot to mention one technique I often use in cases like this, where the expensive jump is taken only once (upon entry); I'll describe it in pseudo-code, it basically is a loop unroll by 4:
switch(count%4) {
case 0:
// do step
goto case1;
case 1:
// do step
goto case2;
case 2:
// do step
goto case3;
case 3:
// do step
count-=4;
if (count>0) goto case0;
}You can do this in any language, with a switch or with labels and jumps (and if the language allows fall-through, you may skip most of the goto's). In assembly, you would still need labels. :)
Luc Pattyn
:badger: :jig: :badger:
Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.
:jig: :badger: :jig: