There is a better way to achieve that?
-
V. wrote: __asm mov x, y That is not a valid assembly language statement. You always require a register for mov instruction AFAMK.
suhredayan
There is no spoon. -
Hi everyone, I need to know if there is a better way to swap variables values in assembly code. I have a thread that sort some variables and I wanted to make it as fast as possible. Also, I wanted to do it in assembly code just to learn a little about it. It is my first attempt in assembly code so apologize for mistakes. If you know how to do it in a better way, let me know! The code you see below works great but I'm sure we can do that in a more "fancy" way. This how I call the function :
UltraSwap( &Var1, &Var2 );
Then this is the function in assembly code :#pragma warning (disable:4035) // disable warning 4035 (function must return something) _inline PVOID UltraSwap( LONG* a, LONG* b ) { LONG x = *a; LONG y = *b; __asm mov eax, x __asm mov ebx, y __asm mov x, ebx __asm mov y, eax *a = x; *b = y; } #pragma warning (default:4035) // Reenable it
The only thing I don't understand is that I can't move*x
or*y
into eax and ebx respectively. I had to declare two local variables to achieve that. I think that just declaring that will take some times. I didn't tested what's the difference in time between swapping two variables in C++ code and in assembly because I didn't know how to, sinceGetTickCount()
isn't too much reliable and not so much fast. Let me know if you guys find something! Have a nice day! Stef Progamming looks like taking drugs... I think I did an overdose. ;-Pif you are only concerned about speed, there is no need for the assembly in this case:
inline void UltraSwap( LONG* a, LONG* b ) { LONG x = *a; LONG y = *b; __asm mov eax, x __asm mov ebx, y __asm mov x, ebx __asm mov y, eax *a = x; *b = y; } inline void UltraSwap2( LONG* a, LONG* b ) { LONG t = *b; *b = *a; *a = t; } inline void UltraSwap3( LONG* a, LONG* b ) { *b ^= *a ^= *b ^= *a; }
of these three methods UltraSwap2 is simplest and fastest (almost twice as UltraSwap) and UltraSwap3 is approx. same as UltraSwap. -
Hi everyone, I need to know if there is a better way to swap variables values in assembly code. I have a thread that sort some variables and I wanted to make it as fast as possible. Also, I wanted to do it in assembly code just to learn a little about it. It is my first attempt in assembly code so apologize for mistakes. If you know how to do it in a better way, let me know! The code you see below works great but I'm sure we can do that in a more "fancy" way. This how I call the function :
UltraSwap( &Var1, &Var2 );
Then this is the function in assembly code :#pragma warning (disable:4035) // disable warning 4035 (function must return something) _inline PVOID UltraSwap( LONG* a, LONG* b ) { LONG x = *a; LONG y = *b; __asm mov eax, x __asm mov ebx, y __asm mov x, ebx __asm mov y, eax *a = x; *b = y; } #pragma warning (default:4035) // Reenable it
The only thing I don't understand is that I can't move*x
or*y
into eax and ebx respectively. I had to declare two local variables to achieve that. I think that just declaring that will take some times. I didn't tested what's the difference in time between swapping two variables in C++ code and in assembly because I didn't know how to, sinceGetTickCount()
isn't too much reliable and not so much fast. Let me know if you guys find something! Have a nice day! Stef Progamming looks like taking drugs... I think I did an overdose. ;-PThanks everyone! I think I will use the UltraSwap2 solution from Zdeslav, since it is more faster than all the other ones. It was my mistake to think that it should be faster to do it in assembly code. But if I refer at my wishes to learn some assembly code, the suhredayan solution is what I was looking for. I used to code on some industrial programmable controllers a long time ago and it was in assembly code. But both syntax and function names wasn't the same. Great help guys! Stef Progamming looks like taking drugs... I think I did an overdose. ;-P
-
if you are only concerned about speed, there is no need for the assembly in this case:
inline void UltraSwap( LONG* a, LONG* b ) { LONG x = *a; LONG y = *b; __asm mov eax, x __asm mov ebx, y __asm mov x, ebx __asm mov y, eax *a = x; *b = y; } inline void UltraSwap2( LONG* a, LONG* b ) { LONG t = *b; *b = *a; *a = t; } inline void UltraSwap3( LONG* a, LONG* b ) { *b ^= *a ^= *b ^= *a; }
of these three methods UltraSwap2 is simplest and fastest (almost twice as UltraSwap) and UltraSwap3 is approx. same as UltraSwap.Zdeslav Vojkovic wrote: of these three methods UltraSwap2 is simplest and fastest
00413701 mov eax,dword ptr [x] __asm mov ebx, dword ptr [y] 00413704 mov ebx,dword ptr [y] __asm mov dword ptr [x], ebx 00413707 mov dword ptr [x],ebx __asm mov dword ptr [y], eax 0041370A mov dword ptr [y],eax
suhredayan
There is no spoon. -
Zdeslav Vojkovic wrote: of these three methods UltraSwap2 is simplest and fastest
00413701 mov eax,dword ptr [x] __asm mov ebx, dword ptr [y] 00413704 mov ebx,dword ptr [y] __asm mov dword ptr [x], ebx 00413707 mov dword ptr [x],ebx __asm mov dword ptr [y], eax 0041370A mov dword ptr [y],eax
suhredayan
There is no spoon.That's what I thought at first sight. I didn't tested Zdeslav's code, I just relied on the comment he gaves us, it seemed to have sence. I didn't thought to scan the code in assembly:^), my fault:) ! So I did a little test with performance counters, just to see what's the real result... :-D I tested the code on my old P3 450, I've tested each solution 10 times and did an average. For the Zdeslav solution:
void UltraSwap( LONG* a, LONG* b ) { LONG t = *b; *b = *a; *a = t; }
IN DEBUG IN RELEASE
One single call : 0.004190476 ms 0.004190476 ms
1000 calls loop : 0.186895210 ms 0.035199995 ms
10000 calls loop : 1.822018770 ms 0.307580906 msand for the suhredayan solution:
_inline PVOID UltraSwap2( LONG* a, LONG* b ) { __asm mov eax, dword ptr [a] __asm mov ebx, dword ptr [b] __asm mov dword ptr [a], ebx __asm mov dword ptr [b], eax }
IN DEBUG IN RELEASE
One single call : 0.004190476 ms 0.003352380 ms
1000 calls loop : 0.170133307 ms 0.016761902 ms
10000 calls loop : 1.599923566 ms 0.158399976 msSo, know, everyone can see the results. I don't think I have to explain furter... :-D In debug mode, there is not a lot of difference but after a 10000 calls loop in release mode, now I'm sure that UltraSwap2 is the great winner! It just took the half time of the other one. If you want to see my test code, let me know, I will try to post it. Thanks suhredayan for your advise, you pointed me on the right track!! Have a nice day, Stef Progamming looks like taking drugs... I think I did an overdose. ;-P
-
Zdeslav Vojkovic wrote: of these three methods UltraSwap2 is simplest and fastest
00413701 mov eax,dword ptr [x] __asm mov ebx, dword ptr [y] 00413704 mov ebx,dword ptr [y] __asm mov dword ptr [x], ebx 00413707 mov dword ptr [x],ebx __asm mov dword ptr [y], eax 0041370A mov dword ptr [y],eax
suhredayan
There is no spoon.Hey, I found something interesting while playing with my test application, I did a modification to UltraSwap2 and made this one :
_inline PVOID UltraSwap3( LONG* a, LONG* b )
{
__asm mov eax, dword ptr [a]
*a = *b;
__asm mov dword ptr [b], eax
}It's not quite elegant for a "supposed" assembly but look at the result in Release mode : UltraSwap2 after 1 loops : 0.004190476 ms UltraSwap2 after 1000 loops : 0.016761902 ms UltraSwap2 after 10000 loops : 0.138285693 ms UltraSwap2 after 1000000 loops : 13.729674098 ms UltraSwap2 after 10000000 loops : 218.611242878 ms ------------------------------------- UltraSwap3 after 1 loops : 0.003352380 ms UltraSwap3 after 1000 loops : 0.013409522 ms UltraSwap3 after 10000 loops : 0.092190462 ms UltraSwap3 after 1000000 loops : 8.895541502 ms UltraSwap3 after 10000000 loops : 128.132170951 ms It's a lot more faster for long loops!! Stef Progamming looks like taking drugs... I think I did an overdose. ;-P
-
That's what I thought at first sight. I didn't tested Zdeslav's code, I just relied on the comment he gaves us, it seemed to have sence. I didn't thought to scan the code in assembly:^), my fault:) ! So I did a little test with performance counters, just to see what's the real result... :-D I tested the code on my old P3 450, I've tested each solution 10 times and did an average. For the Zdeslav solution:
void UltraSwap( LONG* a, LONG* b ) { LONG t = *b; *b = *a; *a = t; }
IN DEBUG IN RELEASE
One single call : 0.004190476 ms 0.004190476 ms
1000 calls loop : 0.186895210 ms 0.035199995 ms
10000 calls loop : 1.822018770 ms 0.307580906 msand for the suhredayan solution:
_inline PVOID UltraSwap2( LONG* a, LONG* b ) { __asm mov eax, dword ptr [a] __asm mov ebx, dword ptr [b] __asm mov dword ptr [a], ebx __asm mov dword ptr [b], eax }
IN DEBUG IN RELEASE
One single call : 0.004190476 ms 0.003352380 ms
1000 calls loop : 0.170133307 ms 0.016761902 ms
10000 calls loop : 1.599923566 ms 0.158399976 msSo, know, everyone can see the results. I don't think I have to explain furter... :-D In debug mode, there is not a lot of difference but after a 10000 calls loop in release mode, now I'm sure that UltraSwap2 is the great winner! It just took the half time of the other one. If you want to see my test code, let me know, I will try to post it. Thanks suhredayan for your advise, you pointed me on the right track!! Have a nice day, Stef Progamming looks like taking drugs... I think I did an overdose. ;-P
ok, we have a misunderstanding here: i compared the results with original post, not with suhredayan's solution which i didn't tested because of comments below it. i never said that my solution is the fastest one, i said that it is fastest of the three i showed. it would be extremely stupid to say that something is fastest, everything can be optimized. another thing is that dissasembly for my function is larger but not that much as it seems, because suhredayan's version is missing prolog and epilog code, which is automatically added by the compiler. when you compare code which is really generated, suhredayan's version is only 2 instructions shorter (16:14). however, i did some test with following test app:
#include "stdafx.h" #include #define ITERATIONS 300000000 inline void UltraSwap2( LONG* a, LONG* b ) { LONG t = *b; *b = *a; *a = t; } inline void UltraSwap4( LONG* a, LONG* b ) { __asm mov eax, dword ptr [a] __asm mov ebx, dword ptr [b] __asm mov dword ptr [a], ebx __asm mov dword ptr [b], eax } int main(int argc, char* argv[]) { long t; long a = 111111111; long b = 222222222; char txt[1024]; { t = GetTickCount(); for(long i = 0; i < ITERATIONS; ++i) { UltraSwap2(&a, &b); } t = GetTickCount() - t; sprintf(txt, "UltraSwap2: %ld iterations done in %ld ms\n", ITERATIONS, t); printf(txt); } { t = GetTickCount(); for(long i = 0; i < ITERATIONS; ++i) { UltraSwap4(&a, &b); } t = GetTickCount() - t; sprintf(txt, "UltraSwap4: %ld iterations done in %ld ms\n", ITERATIONS, t); printf(txt); } return 0; }
here are my results (3 runs for each version): release build, non optimized: D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 3024 ms UltraSwap4: 300000000 iterations done in 4376 ms D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 4387 ms UltraSwap4: 300000000 iterations done in 4366 ms D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 4376 ms UltraSwap4: 300000000 iterations done in 4537 ms release build, optimized for speed: D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 201 ms UltraSwap4: 300000000 iterations done in 771 ms D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 210 ms UltraSwap4: 300000000 iterations done in 761 ms D:\test\Release>test.exe UltraSwap2: 300000000 iterations done in 211 ms UltraSwap4: 300000000 itera -
Hey, I found something interesting while playing with my test application, I did a modification to UltraSwap2 and made this one :
_inline PVOID UltraSwap3( LONG* a, LONG* b )
{
__asm mov eax, dword ptr [a]
*a = *b;
__asm mov dword ptr [b], eax
}It's not quite elegant for a "supposed" assembly but look at the result in Release mode : UltraSwap2 after 1 loops : 0.004190476 ms UltraSwap2 after 1000 loops : 0.016761902 ms UltraSwap2 after 10000 loops : 0.138285693 ms UltraSwap2 after 1000000 loops : 13.729674098 ms UltraSwap2 after 10000000 loops : 218.611242878 ms ------------------------------------- UltraSwap3 after 1 loops : 0.003352380 ms UltraSwap3 after 1000 loops : 0.013409522 ms UltraSwap3 after 10000 loops : 0.092190462 ms UltraSwap3 after 1000000 loops : 8.895541502 ms UltraSwap3 after 10000000 loops : 128.132170951 ms It's a lot more faster for long loops!! Stef Progamming looks like taking drugs... I think I did an overdose. ;-P
yes, almost every solution can be made even better, but this solution doesn't work correctly, it stores the address of a into eax, then sets a to value of b, and then sets b to value now stored in a which results in a and b being equal. if you have a chance, take a look at michael abrash's book "zen of code optimization", it will show you many neat tricks. understanding/knowing assembly can only make you a better developer, so this is the right way to go.
-
yes, almost every solution can be made even better, but this solution doesn't work correctly, it stores the address of a into eax, then sets a to value of b, and then sets b to value now stored in a which results in a and b being equal. if you have a chance, take a look at michael abrash's book "zen of code optimization", it will show you many neat tricks. understanding/knowing assembly can only make you a better developer, so this is the right way to go.
Oups!! :-D I've tested every code in debug to be sure that everything was swapped properly but forgot this one:^)! Thanks Zdeslav! For sure, I will look for the book you're talking about. I ran my test program with swap codes exactly identical to the ones you tested yourself and it gives me always the same result, the assembly code is always still faster. I don't understand. Maybe it depend on the way it is compiled and on which CPU it is ran... I use MS Visual C++ 6.0 Compiler version : MS 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86 Linker version : MS Incremental Linker Version 6.00.8168 My CPU is : Intel Pentium III, 450MHz SDK installed : MS SDK for WinXP SP2 My project settings are the base one for an MFC dialog-based application. Also, every loops are called within a worker thread sets with normal priority. Progamming looks like taking drugs... I think I did an overdose. ;-P
-
yes, almost every solution can be made even better, but this solution doesn't work correctly, it stores the address of a into eax, then sets a to value of b, and then sets b to value now stored in a which results in a and b being equal. if you have a chance, take a look at michael abrash's book "zen of code optimization", it will show you many neat tricks. understanding/knowing assembly can only make you a better developer, so this is the right way to go.
Forget my last post Zdeslav! I found what's going wrong, I missed to place the "inline" instruction in my function header!!:(( Sorry. Ok, now I go sleep, I think I need it! Ha ha ha! Thanks for help! Progamming looks like taking drugs... I think I did an overdose. ;-P