Subtle performance issue

Rama Krishna Vavilala

These issues were found after doing a perf test.

void ParseAndEvaluate(string expression)
{

//Somewhere in the function

if (isdigit(c))
{
....
}
}

//The evaluate function gets called many times

Specifying an alternate version of isdigit boosted the performance by almost 20%. isdigit is slow because of locale checking, which I did not need in the app.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -Brian Kernighan

Nish Nishant

Rama Krishna Vavilala wrote:

Specifying an alternate version of isdigit boosted the performance by almost 20%. isdigit is slow because of locale checking, which I did not need in the app.

Interesting.

Regards, Nish

Nish’s thoughts on MFC, C++/CLI and .NET (my blog)
Currently working on C++/CLI in Action for Manning Publications. Also visit the Ultimate Toolbox blog

Steve S

Done something similar with 'homemade' functions for wide-char stuff like strtol(), giving wonderful performance gains.

Steve S Developer for hire

Robert Inventor

Yes I've also found it worthwhile to do that sort of thing. I find that isspace() is also very slow if you use it frequently - that's another routine that uses locales. I've found dramatic increases in speed by replacing it in code where it is used extensively. BTW I code in plain C rather than C++. My own coding is also done at a fairly low level so I'm not shy about using globals to increase performance, or indeed for other reasons too, e.g. where it simplifies the code to use a global. Of course this has to be done with care and one must know what the issues are with globals and how to avoid them. The main thing I find is to have a good system of naming for the globals, to have long descriptive names so that you know exactly what the global is used for and where it is used, often prefixed with "g" for global. So in this case I just declare a global array, say: BOOL g_is_space_char[256]; where the idea is to have g_is_space_char[c] set to 1 if and only if c is a space character. Instead of a call to isspace(c) I fill the g_is_space_char[c] array once when the program starts up. From then on instead of using isspace(..) I use: #define is_space(c) (g_is_space_char[(unsigned char)c]) CharLower(..) and CharUpper(..) applied to single characters are also particularly slow, and can be speeded up a lot in a rather similar fashion, by filling up g_char_lower_char[256] etc (you could refill it whenever you need to change the locale), then you have: #define char_upper_char(c) (g_char_upper_char[(unsigned char)c]) Here are some figures for comparision - in milliseconds, for 1,000,000 loops of each routine in an optimised release build on my 2.4 Ghz laptop:

isspace 1.33
is_space 0.0011
CharUpper 129.4
char_upper_char 0.0011
assignment only 0.0011

So is_space() here is the same speed as a simple assignment and more than 1000 times faster than isspace(). char_upper_char()is more than 100,000 times faster than CharUpper() Well anyway I know this way of working isn't to everyone's taste and you do lose type checking of #defined routines which is something to be well aware of when using this approach - but it often leads to performance gains :-) Robert

pragma codeproject

When performance is an issue ( and it is more often than one thinks ) this is a nice trade of speed over memory. Maybe not for an embedded system but... R

Robert Inventor

Hi there, Yes - I'm writing from a particular experience of using this technique to bug fix code that was very slow, for parsing of large human readable .ini files. So in this case performance was indeed an issue for isspace, and for CharUpper. After my post btw, I wasn't sure of these figures as they are so small, particularly for is_space - I think perhaps the optimiser was optimising it away as it had no effect on anything else. So I've done it again, this time assigning the result to successive elements in a large array so that the code can't be optimised away. Now the results are:

Overhead in milliseconds for the loop and
assignment to successive elements of the array
for 1,000,000 loops: 24.029
isspace 53.2861
is_space 24.0346
CharUpper 118.435
char_upper_char 24.2246
(timed using the Pentium high performance timer)

After subtracting the overhead from all the others:

isspace 29.2571
is_space 0.0056
CharUpper 94.40600
char_upper_char 0.1956

So now it makes is_space about 5,000 times faster than isspace and CharUpper about 500 times faster than char_upper_char Hopefully that is more accurate now. Robert