function translated to ASM

Calin Negru

What`s a function in ASM? My guess is that it`s an isolated sentence sequence that gets an ID. When the sequence is called from another sequence, the location in the original sequence where the calling is taking place is saved in the sequence being called (as some kind of statement that is placed at the end), the execution of the initial sequence is paused and the traversal/iteration through the sequence being called is started, when the execution reaches the last statement, that last statement contains the saved location of the place where the sequence was called from and is used to resume the execution in the first/initial sequence.

CPallini

Your guess is mostly right. For a real life assembly example, see, for instance: 8051 CALL INSTRUCTIONS[^].

"In testa che avete, Signor di Ceprano?" -- Rigoletto

trønderen

As CPallini writes: Essentially correct. But your description is so abstract that it applies as well to functions in any algorithmic language, from Fortran through Algol and Pascal and C and C#. It is certainly not ASM specific. Actually, I'd say: Quite to the contrary ... If your title line hadn't said 'function translated to ASM'. If you hand code ASM, you have a lot more freedom. E.g. that 'sentence sequence' would not have to be that isolated: A function could have multiple entry points. (For an extreme case: Read Jumping into the middle of an instruction ...[^]). Also, I think that parameter transfer and return of result value(s) is such an essential part of the function concept that it should be included in even the most basic definition/description of the function concept. But again, parameters are certainly not specific to ASM functions; it applies equally to ASM and high level languages. Rant part: I really wish that you were right about 'an isolated sentence sequence that gets an ID'. That is not the case neither in ASM nor in C style languages. The ID does not identify the sentence sequence, but the point in the code at the start of the sequence. This is one of the major fundamental flaws in the design of these languages. In a few other algorithmic languages, such as CHILL, a label identifies a sentence sequence, that be a function, a loop, a conditional statement or whatever. Usually, a sentence sequence is termed a 'block'. You can e.g. break out of any block by stating its ID, even if it is not the innermost one. You can have compiler support for block completion by repeating the block ID at the end, improving readability a lot and catching nesting errors. If there were a dotNET CHILL compiler out there, I'd gladly kick out C# (even if C# certainly is my favorite alternative in the C class of languages)!

Fly Gheorghe

In ASM there is no distinction between functions and procedures. The name procedure is usually used. You can only CALL a procedure. A function in the high-level sense (a procedure that returns something) is just a variant. Regarding passing parameters to procedures in ASM. This can be done: a) By putting values to CPU registers. This works if the number of parameters is small and parameters are rather simple data types. The procedure has direct access to parameters by means of registers. Compilers do this for simple functions/procedures/methods. Of course you need to save registers to stack and restore them after return. This is named the call sequence/frame of the procedure. b) By pushing parameters to the stack. This is the facto standard. You can push parameters from left to right (the so-called "Pascal" convention) or from right to left (the so-called "C" convention). The "C" convention works also with procedures that have a variable number of parameters. This is why the C function printf has the format as the first (and mandatory) parameter - it will be on the top of the stack when entering printf and printf will know where to find it (the format is supposed to correctly describe the number ad type of each other parameters like %s, %d etc.) When returning from the procedure the stack must be discarded of the parameters that were put on the stack. This can be done by the caller (the "C" approach) or by the procedure (the "Pascal" approach). E.g. "ADD SP, 24" or "RET 24". The c/C++ compilers use of course the "C" approach. Observe that the caller "knows" exactly how many parameters were pushed onto the stack so discarding the stack by the caller is more natural. Windows SDK uses "Pascal" convention. When dealing with large objects that must be passed, it's easier to pass then by reference, i.e. to pass an address (pointer) to a memory area where the object is stored. A pointer is a simple type. If you really need to pass a large object by value (i.e. make a copy), you can copy the internal representation of the object onto the stack, and define the stack frame so that procedure has access to it. However this is more time-consuming. b) Combinations of the above 2 methods. A procedure can return a value (i.e. becoming a function) by: 1) A register (if the return value is a scalar type). For Intel CPU, the convention is to return in the accumulator (AL, AX, DX:AX, EAX, etc., depending on the processor type). Observe that scalar types include all numerical values (int, float, double) and poin

Calin Negru

Thanks CPallini, a confirmation/denial is what I was looking for.

CPallini

You are welcome.

"In testa che avete, Signor di Ceprano?" -- Rigoletto

trønderen

In ASM there is no distinction between functions and procedures.

I haven't been programming languages that made a syntactically explicit distinction between functions and procedures since I used Pascal last time (and that is quite a few years ago). For a short period, I found it difficult to merge the two into one concept, but soon I started asking myself 'Why?'. A function with a void (/null) result is as good a procedure as any!

Regarding passing parameters to procedures in ASM.

Again, this is not specific to ASM. Some platforms, such as ARM, define a binary call and parameter interface independent of programming language. If you follow that standard, you can call functions in any other language, and any other language can call your ASM functions. If you do not, then you are misbehaving :-) You didn't mention one parameter passing method that was the only viable one on machines with extremely small stacks (like the 8051): The accumulator holds the address of a 'struct'-like block of values, allocated anywhere, possibly statically. The call conventions say that the accumulator is volatile; you never expect it to retain its value when other code is executed, so you do not save/restore it for a function call. Btw: In the Win32 API, this convention is used for a share of the function calls: (The address of) a single composite struct is passed by the caller. The first word in the struct indicates its size, so when a new, extended version of the function is published, taking more parameters, the name of the function is unchanged, and the extra parameters are added at the end of the struct. The function can see whether the the caller wants the old or the new extended functionality from the size of the struct. And, it reduces the risk of overflow. The alternative, used by another share of the Win32 functions, is to extend the function name with an 'Ex' (and a an extended parameter specification. Later comes the 'FuncExEx', and 'FuncExExEx' and ... there are cases of function names with five 'Ex' suffixes in a row. I think that is extremely messy. I much prefer the 'parameter struct' alternative (using that philosophy in my own code).

The c/C++ compilers use of course the "C" approach.

By default, that is. I have never used a C/C++ compiler that could not be directed to use Pascal conventions (that is a requirement for calling Win32 functions!). Note that 64 bit Windows has different calling

Fly Gheorghe

I am trying to keep discussions to a general level, so that statements were valid 20 years ago, are valid today and will be valid 20 years from now, at least with the classic CPU architecture. 1) Regarding stack management: From CPU perspective, when entering a procedure, the stack is just a memory contiguous area defined by a segment descriptor and by a stack pointer. It is irrelevant how this memory area was allocated: statically, when the process was started or dynamically before the call. Dynamically means that somebody must deallocate that area as well. 2) Run-time/development environment matters when choosing how to pass/return parameters Allocating memory on the heap is fine, assuming you have a heap in the first place. This assumes calls to the OS to get/release memory, but what if don't have on OS at all? What if you write code for a dedicated hardware controller and the only memory is statically defined? There are special environments like space/military/medical in which you are not even allowed to use dynamic memory allocation, for obvious reasons. 3) I still say that a good insight in hardware and in assembly language is essential for becoming a good software engineer. If not, who should have these insights? I don't write assembler as well nowadays, but the fact that once I did helps me write better C/C++/C# code. 4) XOR AX, AX vs. MOV AX, 0 It is not only about speed, but also about instruction encoding. "XOR AX, AX" occupies just one byte of memory, while "MOV AX, 0" occupies one byte for the op code and 2 bytes for the "immediate" 16-bit operand. If you consider 32 bits, then "MOV EAX, 0" occupies 5 bytes: one for the instruction code and 4 bytes for the 32-bit operand. The compiler treats all immediate operands in the same way. Following the same logic, on a 64-bit CPU, "MOV RAX, 0" will occupy 10 bytes, since the operand is on 8 bytes, while "XOR RAX, RAX" will be on 2 bytes only (64-bit prefix and op code). There is also another aspect. If you want to do compare operations then you must be sure that the arithmetic flags are correctly set with respect to the entity you want to compare. A conditional jump "JNZ address" will not work as expected after "MOV AX, 0" (if the AX is what you want to jump on) because "MOV" does not set any of the arithmetic flags, but will work fine after "XOR AX, AX", because "XOR" does. So those students who insisted on using XOR instead of MOV were fully right.

Richard Andrew x64

You seem to know a lot about assembly language. Are you aware of any C or C++ compilers that can generate location-independent code? That's been something I've been interested in for a long time.

The difficult we do right away... ...the impossible takes slightly longer.

k5054

Linux uses position independent code (PIC) to produce shared libraries. PIC objects can be produced with the -fPIC flag to either GCC or CLANG. More information here: [fPIC option in GCC](https://iq.opengenus.org/fpic-in-gcc/)

Keep Calm and Carry On

Richard Andrew x64

Neat!

The difficult we do right away... ...the impossible takes slightly longer.

Calin Negru

Rather interesting. worth remembering.