blind micro optimisation
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I could argue the performance either way, so unless this has been pinpointed as a bottleneck I would ignore performance. With that in mind I would go with the first method, as it is a lot clearer and more maintainable. If it does become a performance bottleneck I would consider refactoring the source so it does not store A and B together, as that will presumably be an issue in a lot of places.
-
I could argue the performance either way, so unless this has been pinpointed as a bottleneck I would ignore performance. With that in mind I would go with the first method, as it is a lot clearer and more maintainable. If it does become a performance bottleneck I would consider refactoring the source so it does not store A and B together, as that will presumably be an issue in a lot of places.
I am glad you find first method clearer too! :) Yeah, whatever... Arguing during code review is painful, so I'll go with version 2, it's pointlessly but minutely different... no biggie...
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I would prefer option one. It seems cleaner to me. I find code that jumps through 15 methods to do a job confusing. For performance, there is no way to comment. I would focus on how is this method going to get used. If it is once a day and saves me a nanosecond, I do not care as long as application does not need to be that fast. But if it is used million times a day and saved a millisecond each run, then I would focus. I find these performance analysis of individual methods not really useful. We need to look at application usage in my opinion rather trying to make fastest ever code. In the end, this is a business and if I spend a day to refactor to gain a second in a day at say 1000 Euro cost, it does not look good.
"It is easy to decipher extraterrestrial signals after deciphering Javascript and VB6 themselves.", ISanti[^]
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I think this is a very valuable comment from the reviewer and it has potential to improve performance a lot. It can take a lot of time figuring out who understands what to optimize and who doesn't have a clue. With this review comment you will no longer be in doubt: That guy has no idea and everything he says about performance can simply be ignored. Think of all the time his comment can save you in the future - Your performance will definitely improve by not wasting time on him.
-
I think this is a very valuable comment from the reviewer and it has potential to improve performance a lot. It can take a lot of time figuring out who understands what to optimize and who doesn't have a clue. With this review comment you will no longer be in doubt: That guy has no idea and everything he says about performance can simply be ignored. Think of all the time his comment can save you in the future - Your performance will definitely improve by not wasting time on him.
Wow, amazing tip! :D
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
I prefer #1. It might even be faster since
AddRange()
might do clever stuff under the hhod"If we don't change direction, we'll end up where we're going"
megaadam wrote:
It might even be faster since AddRange() might do clever stuff under the hhod
Unfortunately not. Since the
OfType
method doesn't return anICollection<T>
, the list can't pre-allocate enough space for the new items, since it doesn't know how many there will be. Reference Source[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
An ultra pedantic code reviewer would suggest that
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}could be improved by doing
foreach (var item in source)
{
if (item is A a)
list.Add(a);
else if (item is B b)
DoThing(b);
}or
foreach (var item in source)
{
if (item is B b)
DoThing(b);
else if (item is A a)
list.Add(a);
}as that would save rechecking found items. This would, however, possibly not give the desired effect if an item could be both an A and B (e.g. if A inherits B, B inherits A, or A / B are interfaces and the item can implement both interfaces). Of course, you should spent a couple of weeks analysing these two code snippets with multiple copies of sample data to see which one saves the most milliseconds per decade X| :wtf: So, as others have said, ignore the code reviewer and do what makes sense to you.
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
Seems like a niche use-case, but how about something like this?
static class Extensions
{
public static IEnumerable<B> SplitList<T, A, B>(
this IEnumerable<T> source,
IList<A> listOfA)
where A : T
where B : T
{
foreach (T item in source)
{
switch (item)
{
case A a:
{
listOfA.Add(a);
break;
}
case B b:
{
yield return b;
break;
}
}
}
}
}List<A> list = new();
DoThings(source.SplitList<YourBaseType, A, B>(list));static void DoThings(IEnumerable<B> items)
{
foreach (var b in items)
{
// do something
}
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I would measure them to be sure. But I agree that calling
DoThings
once is likely to be better than callingDoThing
a great many times. The particular micro-optimization I'm working on now will likely lead to a post in Weird and Wonderful. -
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I'm guessing the refactor is faster since it only has to enum once.
To err is human. Fortune favors the monsters.
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
I'd argue that you really need two lists, one of Type[A] and one of Type[B] (sorry, couldn't figure out how to avoid font changes with the <>). This appears to really be an issue of delaying the decision until too deep in the code's decision trees. The new code is easier to read and it really points out that your list should be two lists.
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
To me, this is a clear case of premature optimization and (almost) every optimization is premature if it sacrifices clarity when the need for optimization has not be demonstrated. There are certain cases involving function calls that I know for a fact can be done better so I deal with those up front rather than later since the delta time spent is zero. Here's an exmple :
x*x
is considerably faster thanpow(x,2)
so I deal with that up front instead of returning to it later. Especially since calling thepow
function does not enhance clarity."They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
I'm guessing the refactor is faster since it only has to enum once.
To err is human. Fortune favors the monsters.
but it call a method for every item..... I was unsure, or maybe thought it was going to be a bit in fact slower (extra method call, where enumeration is cheap)... Anyway, I was so annoyed I made a simple test program at home and.. turns out new method is faster indeed (by 20% on my simple test thing) After that, well, I beautify the new code a bit and rolled with it! :)
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
I would measure them to be sure. But I agree that calling
DoThings
once is likely to be better than callingDoThing
a great many times. The particular micro-optimization I'm working on now will likely lead to a post in Weird and Wonderful.well.. I did think that too.... then I did a simple test home and, well, to my stunned surprise the optimisation was 20% faster!
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
-
but it call a method for every item..... I was unsure, or maybe thought it was going to be a bit in fact slower (extra method call, where enumeration is cheap)... Anyway, I was so annoyed I made a simple test program at home and.. turns out new method is faster indeed (by 20% on my simple test thing) After that, well, I beautify the new code a bit and rolled with it! :)
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
Looking at the code I would expect the second to be faster. The reason is that enumeration is slow relative to function calls. The original code enumerates three times over source and creates temporary objects. The rewritten code enumerates once over source and creates no temporary objects. Super Lloyd actually tested and found the second code is 20% faster. [The Lounge](https://www.codeproject.com/Lounge.aspx?msg=5880319#xx5880319xx)
-
on a code review someone commented I was enumerating twice a really big list, could I have only one loop instead, they said initial code, where
source
is a large listlist.AddRange(source.OfType<A>());
DoThings(source.OfType<B>());void DoThings(IEnumerable<B> items)
{
foreach (var b in item)
{
// do something
}
}so to please the crowd I refactored new code
foreach (var item in source)
{
if (item is A a)
list.Add(a);
if (item is B b)
DoThing(b);
}
void DoThings(IEnumerable<B> items) // still in use through an other codepath
{
foreach (var b in item)
DoThing(b);
}
void DoThing(B b)
{
// do something
}now... without doing any measurement (hence the blind adjective) I am suspecting version 2 is in fact slower, since enumerating is cheap, but there a lot more method calls. What says you?
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
Performance is not the main reason to avoid multiple enumeration. Depending on the origin of the source, the collection may be changed between any two enumerations, so the result will be a bug that is extremely hard to reproduce.
-
but it call a method for every item..... I was unsure, or maybe thought it was going to be a bit in fact slower (extra method call, where enumeration is cheap)... Anyway, I was so annoyed I made a simple test program at home and.. turns out new method is faster indeed (by 20% on my simple test thing) After that, well, I beautify the new code a bit and rolled with it! :)
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!
In my experience with codegen method calls don't require much overhead. I haven't run any hard tests directly, but I have written a codegen system that did method calls to separate the non-terminals and one that did not. The one that did was significantly faster, I suspect due to keeping repeatedly used code cached at the CPU level whereas with longer methods that's not possible. That's what leads me to lean toward what I leaned toward. I suspect JIT magic to reduce overhead, but there's not much you can do to with JIT to avoid allocation overhead. With .NET allocation is cheap in theory, but people forget about the overhead of the constructors.
To err is human. Fortune favors the monsters.
-
In my experience with codegen method calls don't require much overhead. I haven't run any hard tests directly, but I have written a codegen system that did method calls to separate the non-terminals and one that did not. The one that did was significantly faster, I suspect due to keeping repeatedly used code cached at the CPU level whereas with longer methods that's not possible. That's what leads me to lean toward what I leaned toward. I suspect JIT magic to reduce overhead, but there's not much you can do to with JIT to avoid allocation overhead. With .NET allocation is cheap in theory, but people forget about the overhead of the constructors.
To err is human. Fortune favors the monsters.
nice hindsight! :)
A new .NET Serializer All in one Menu-Ribbon Bar Taking over the world since 1371!